Open nzl0016 opened 5 years ago
Hi, I think, I am experiencing the similar situation: I used stringtie and prepDE to calculate the read counts for genes and HTSeq for the same. The number of reads for stringtie is much lower. It seems like stringtie filters out some reads. If it is the case, what are the criteria for such filtration? If not, what other reasons could be? Thank you!
Hi,
What I did to my data is to map one population to the reference genome generated from another population. I guess there are big genome differences between the two populations, and thus many reads got filtered out, because when I mapped the population to the corresponding reference the results are good. So I ended up using featurecounts (HTSeq is also ok) for my data analysis, which gave me a satisfying results.
Best,
Ning
On Nov 4, 2019, at 01:06, OlgaVT notifications@github.com wrote:
Hi, I think, I am experiencing the similar situation: I used stringtie and prepDE to calculate the read counts for genes and HTSeq for the same. The number of reads for stringtie is much lower. It seems like stringtie filters out some reads. If it is the case, what are the criteria for such filtration? If not, what other reasons could be? Thank you!
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_gpertea_stringtie_issues_223-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAMIUKPVHKNW36JWZ2MCEILLQR7QXRA5CNFSM4HVKQROKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEC6SWVQ-23issuecomment-2D549268310&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=qstcNNQYmDDRoPOQLFDcVw&m=VuCYHQDRojrI0R1_RM7ZDRsMnZTsJV21iXZTecb-dGA&s=E6R7NfEbVzRsSGvQEEimmv9QOBV1u8BSLZJ9nDsT_48&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMIUKPRMGTCBYRDI5YT5JVDQR7QXRANCNFSM4HVKQROA&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=qstcNNQYmDDRoPOQLFDcVw&m=VuCYHQDRojrI0R1_RM7ZDRsMnZTsJV21iXZTecb-dGA&s=Nhu5m2j4AjB3srqGELKw6p2yF_2QgLOV-nh-Bm-_fWg&e=.
I have been using hisat2-stringtie-DESeq2 pipeline for a while, and then I realized the sum of raw counts from all the genes only accounts for 20% of my mapped reads. That is a pretty low number and then I tried HTSeq and the sum gave me a number close to the number of mapped reads. I do not know the reason and after searching online I do not see comments like my situation. Anybody has a similar experience or knows what happened to my analyses?