thelovelab / tximport

Transcript quantification import for modular pipelines
136 stars 33 forks source link

Different length and counts depending on the number of loaded samples #54

Closed FlorianRocher closed 1 year ago

FlorianRocher commented 1 year ago

Hi,

I noticed something weird using tximport by summarizing at the gene level. Depending on the number of samples I load I don't have the same length and counts for my genes in the corresponding samples.

Quant_Matrix_tximport=tximport(**AlltheFiles**,txIn = T, txOut = F, type="salmon", countsFromAbundance ="lengthScaledTPM", tx2gene = tx2gene) Quant_Matrix_tximport1=tximport(**20FirstFiles**,txIn = T, txOut = F, type="salmon", countsFromAbundance ="lengthScaledTPM", tx2gene = tx2gene)

Example: When all samples are load I have Sample1:Gene 1 --> Counts=386.3943113 When 20 samples are loaded I have Sample1:Gene1 --> Counts=383.4657441

I used "no" option insted to check. I did not see any differences at the count level but at the length of the genes level.

Is this something that is expected ?

Florian Rocher

mikelove commented 1 year ago

Yes expected.

When a gene has some samples with no expression for any isoform, the average length for those samples depends on the other samples.