Closed MiqG closed 2 years ago
Hi @MiqG, floating point read counts for genes and isoforms are a result of multi-mapping or ambiguous reads, which can be partially assigned by the expectation-maximization algorithm. This is expected (details are in the paper, and many other papers/software which do essentially the same thing). How you handle this downstream with other tools is up to you.
Perfect, thank you!
Hi!
First, thanks for creating such an efficient and useful package!
I have just started trying out Whippet.jl to extract information from RNA seq data. So far, following your documentation, I was able to build the index using GENCODE annotations and to run whippet-quant, which resulted in 5 different files (I used quant as a prefix here):
As I explored the file quantifying mRNA levels at the gene level I saw that you report them as TPM and read count. However, the read count does not seem to be an integer in all cases. Is there a reason for that? I am interested in being able to differential gene expression analysis as well with packages like DESeq that only accept integer count data. Should I round it into integers?
I am running Whippet v1.6.1.
Thank you very much in advance!
Miquel