Closed hwartmann closed 5 years ago
Hi @hwartmann,
This is basically the same as the second question in https://github.com/leekgroup/recount/issues/18 that Jack Fu @JMF47 will answer.
Best, Leo
Hi @hwartmann, I have responded in the other thread. Brief recap here is that when read-lengths of samples differ, we have differing abilities to estimate transcript abundances.
Thank you for getting back to me @JMF47
So what is your suggestion to deal with these transcripts? Can I set the NA's to zero or should I drop any transcripts containing a NA?
What is your objective? I would recommend against setting NAs to 0. Whether or not you drop a transcript that contains any NAs depends on what you would like to do with the data.
Will I run into the same issue if I work with recount2 gene or exon counts?
I do not believe so, but @lcolladotor can chime in on the gene and exon count front.
OK, thanks. But in any case, we do not really understand how what you described can result in NA. Could you maybe elaborate a bit more or point me to source that would explain this to us?
https://www.biorxiv.org/content/biorxiv/early/2018/01/12/247346.full.pdf
. Particularly, the estimation of the feature matrix, which calculates the expect number of counts falling into each exon/junction feature depending for a random read of a certain read-length.
There are no NAs on the counts for the gene/exon RSE objects. The counting method is different for those than for the transcript ones. See https://f1000research.com/articles/6-1558/v1 for the gene/exon ones.
Hey
I've noticed that there are some transcripts that contain NA's in the assay count table. E.g. ENST00000622420.1 in DRP001055. In this case there are NA's for all four samples. In the GTEx data there are a total of 4.2 million NA's e.g. for transcript ENST00000604479.5, but here it's only for a subset of the samples.
Could you please verify that for me and let me know how to interpret this? I've been struggling with his for a few days now.
Thank you