jonassibbesen / rpvg

Method for inferring path posterior probabilities and abundances from pangenome graph read alignments
MIT License
47 stars 6 forks source link

Software for differential expression analysis from rpvg's readcount #37

Closed lsoldini closed 2 years ago

lsoldini commented 2 years ago

Overall question

Which software would you advise to perform differential expression analysis on the ReadCount produced by rpvg ?

Context

I've used the mpmap-rpvg pipeline to 1) map RNA-seq reads, 2) quantify their expression. I now have set of files with ReadCount for each genomic feature and I would want to perform differential expression analysiss.

Thoughts From the "haplotype specific ..." paper: "While HST expression estimates can always be marginalized to produce allele or transcript expression estimates, more general statistical frameworks will need to be developed to avoid information loss between these steps in transcriptomic pipelines".

I understand that, in the ideal scenario, it would be required to have something coming after mpmap and rpvg to specfically make use of all the information from those pantranscriptomic methods, and to build proper statistical model to perform differential expression analysis.

But, in the meantime, I was wondering if any of the existing software for differential expression analysis could be suited ? For instance, what do you think about Sleuth (Patcher lab, designed to work with kallisto's).

lsoldini commented 2 years ago

It is quite easily feasible to compute estimated counts from the read counts by using the length and effective length - i.e., it is then possible to use sleuth or other tools working on pseudocounts.

At the end, I am using edgeR. I do not think it is the best, but in the absence of a dedicated software, this may be amongst the less worst. DEseq2 would probably give similar results (since they share most of their models). Anyway, the results obtained with rpvg | edgeR are quite consistent with previous analyses we did using kallisto.

jonassibbesen commented 2 years ago

Hi, sorry for the late reply. Have been on vacation.

If you are primarily interested in transcript expression I would imagine that most differential transcript expression methods that work with expression estimates from Kallisto, Salmon etc. would also work with rpvg. Like Sleuth, edgeR etc. However, it is not something I have experience with. You would likely need to change the output format so that it matches what is expected by the methods, but it seems like you are already aware of that.

Differential expression analysis is not something we have explored much yet, but it is something we are really interested in. You are right that in the ideal case we would want to use a method that models differences in both the haplotype and transcript expression, but such a method does not exist yet to our knowledge.

lsoldini commented 2 years ago

Hi, no problem, sorry for my late reply as well.

So, at the end, I used edgeR, which indeed required some output re-formating, but it was quite straightforward (once the software was decided). It gave results similar to what we had when we used kallisto (i.e., not based on rpvg), although we did not extensively assessed and compared their results.

CarlosAmadeo7 commented 1 month ago

Hello Isoldini I am Carlos, and I have started to wonder how I can do a differential expression analysis of the haplotype-transcript expression I got as the output of rpvg. I was reading the issues of rpvg to find information and I read your comment, if you don't mind I would like to ask you a couple of questions: Did you use the single haplotype or the dyplotype output to marginalize the haplotypes? How did you do the marginalization, if you don't mind? ( I am new and learning quick) You mentioned that you changed the output format, How come? Can you give me more information, please

I appreciate your time