Open keksundso opened 1 year ago
I didn't take a deep look but my first guess would be that depending on your input, a different set of transcripts passes sleuth
's internal filters, resulting in a different total number of reads and features being used as the basis for the TPM normalization. If you have less features, all remaining features will end up with higher numbers because TPM always sum up to 1 M. So maybe you can get the behaviour that you want by manually overwriting the filter settings.
This sounds reasonable, especially since the variation is a constant shift proportional to the tpm values as it would be expected by a different total read number. I replaced the filter function by a custom one which should not filter anything:
myFilter <- function (row, min_reads = 0, min_prop = 0)
{
mean(row >= min_reads) >= min_prop
}
sleuth.obj <- sleuth_prep(sample_to_covariates = s2c,
target_mapping = transcript2gene,
extra_bootstrap_summary = TRUE,
read_bootstrap_tpm = TRUE,
aggregation_column = 'ens_gene',
num_cores = numberOfCores,
gene_mode = TRUE,
filter_fun = myFilter
As expected, in both combinations the same number of targets and genes now pass the filter. However, the difference in the tpm values between both combinations persist.
Using sleuth I can calculate the TPM-Value of a gene in a Sample (sleuth_prep followed by sleuth_to_matrix).
Now if I have 6 samples (6 abundance tables from Kallisto) with two belonging to on of the three conditions P, G and H, I would expect to get one TPM-Value per gene per sample.
However, the TPM-Value for Gene i in sample j is not a fixed value, but i varies depended on the combinations of conditions which go into sleuth_prep.
E.g. gene i in P1 has a different TPM-value when condition P and G goes into sleuth_prep compared to gene i in P1 with the condition P and H. See minimal example below: