vastgroup / vast-tools

A toolset for profiling alternative splicing events in RNA-Seq data.
MIT License
77 stars 28 forks source link

GRCh38 and Uniquely Mapped Reads #52

Closed argrosso closed 8 years ago

argrosso commented 8 years ago

Hi,

I have two questions:

1) Can we run vast-tools using the latest human genome version GRCh38? How can we build the DB files for other genome versions?

2) How does the "vast-tools align" tool behaves relative to multiple mapping reads? Can we select only uniquely mapped reads for cRPKMs estimation?

Thank you Ana Rita

mirimia commented 8 years ago

Hi Ana Rita,

1) Can we run vast-tools using the latest human genome version GRCh38? How can we build the DB files for other genome versions?

Unfortunately, vast-tools can only be run for the available genome version (hg19, mm9 and galGal3, at this point). Building the VASTDB is still quite complex, and therefore there are no available scripts to do it. We're working on it, but it is moving slowly.

In any case, for GRCh38 specifically, it is more than enough to simply do a liftOver of the coordinates. The assemblies do not improve much for human or mouse anymore (at least for most of the genes), what makes it less of an urgent need.

2) How does the "vast-tools align" tool behaves relative to multiple mapping reads? Can we select only uniquely mapped reads for cRPKMs estimation?

vast-tools uses ONLY uniquely mapping reads for any estimate (PSIs, cRPKMs, etc). Then, the read counts are corrected by the effective number of uniquely mapping positions. In fact, that's what cRPKM stands for. Please have a look at the Labbe et al (Stem Cells 2012) or Barbosa-Morais et al (Science 2012) for a more detailed explanation. Otherwise, I'm sure Nuno can give you more details.

argrosso commented 8 years ago

Hi Manuel,

Thank you for answering so fast.

Hi Ana Rita,

1) Can we run vast-tools using the latest human genome version GRCh38? How can we build the DB files for other genome versions?

Unfortunately, vast-tools can only be run for the available genome version (hg19, mm9 and galGal3, at this point). Building the VASTDB is still quite complex, and therefore there are no available scripts to do it. We're working on it, but it is moving slowly.

In any case, for GRCh38 specifically, it is more than enough to simply do a liftOver of the coordinates. The assemblies do not improve much for human or mouse anymore (at least for most of the genes), what makes it less of an urgent need.

I will try with the liftOver.

2) How does the "vast-tools align" tool behaves relative to multiple mapping reads? Can we select only uniquely mapped reads for cRPKMs estimation?

vast-tools uses ONLY uniquely mapping reads for any estimate (PSIs, cRPKMs, etc). Then, the read counts are corrected by the effective number of uniquely mapping positions. In fact, that's what cRPKM stands for. Please have a look at the Labbe et al (Stem Cells 2012) or Barbosa-Morais et al (Science 2012) for a more detailed explanation. Otherwise, I'm sure Nuno can give you more details.

Ok, I was not sure if the “corrected” really mean uniquely mapped.

Thank you again Ana Rita

— Reply to this email directly or view it on GitHubhttps://github.com/vastgroup/vast-tools/issues/52#issuecomment-159308926.

mirimia commented 8 years ago

Yes, the "corrected" actually means "corrected for mappability", which refers to uniquely mapping reads mappability...