itmat / rum

RNA-Seq Unified Mapper
http://cbil.upenn.edu/RUM
MIT License
26 stars 4 forks source link

junction score with paired-end data - is it # of reads or fragments? #170

Open nmanik opened 11 years ago

nmanik commented 11 years ago

Hi Mike, I hope you're doing well. I'm back again after a hiatus, with a simple question this time. From user guide/wiki, I see that score in junctions_high-quality.bed "is the number of uniquely mapping reads crossing the junction with at least 8 bases on each side."

If I've paired-end data, is this score the number of reads or fragments crossing the junction? I would like to get the count of fragments, as it avoids overcounting when fragments are short (and hence both left and right mate reads cross the junction).

If RUM only reports reads, do you have plans for reporting fragments or any recommendations on how I can get fragment count (possibly from some other output file or which code should I look into)?

Thanks, Mani

mdelaurentis commented 11 years ago

Mani,

I believe the counts are based on fragments, not reads. By the time we produce the junctions files, we have already merged overlapping paired reads. So if the forward and reverse read for the same fragment both span a junction, they will overlap each other, and so will have been merged together anyway and will only be counted once.

I'm copying Greg for confirmation, as he is more familiar with this part of RUM than I am.

Thanks,

Mike

On Tue, Apr 16, 2013 at 3:40 PM, nmanik notifications@github.com wrote:

Hi Mike, I hope you're doing well. I'm back again after a hiatus, with a simple question this time. From user guide/wiki, I see that score in junctions_high-quality.bed "is the number of uniquely mapping reads crossing the junction with at least 8 bases on each side."

If I've paired-end data, is this score the number of reads or fragments crossing the junction? I would like to get the count of fragments, as it avoids overcounting fragments are short (and hence both left and right mate reads cross the junction).

If RUM only reports reads, do you have plans for reporting fragments or any recommendations on how I can get fragment count (possibly from some other output file or which code should I look into)?

Thanks, Mani

— Reply to this email directly or view it on GitHubhttps://github.com/PGFI/rum/issues/170 .

mdelaurentis commented 11 years ago

That's correct, they are FPKM's. Mike we should change this on our documentation. Thanks, Greg

On Wed, 17 Apr 2013, Mike DeLaurentis wrote:

Mani,

I believe the counts are based on fragments, not reads. By the time we produce the junctions files, we have already merged overlapping paired reads. So if the forward and reverse read for the same fragment both span a junction, they will overlap each other, and so will have been merged together anyway and will only be counted once.

I'm copying Greg for confirmation, as he is more familiar with this part of RUM than I am.

Thanks,

Mike

On Tue, Apr 16, 2013 at 3:40 PM, nmanik notifications@github.com wrote:

Hi Mike, I hope you're doing well. I'm back again after a hiatus, with a simple question this time. From user guide/wiki, I see that score in junctions_high-quality.bed "is the number of uniquely mapping reads crossing the junction with at least 8 bases on each side."

If I've paired-end data, is this score the number of reads or fragments crossing the junction? I would like to get the count of fragments, as it avoids overcounting fragments are short (and hence both left and right mate reads cross the junction).

If RUM only reports reads, do you have plans for reporting fragments or any recommendations on how I can get fragment count (possibly from some other output file or which code should I look into)?

Thanks, Mani

— Reply to this email directly or view it on GitHubhttps://github.com/PGFI/rum/issues/170 .

nmanik commented 11 years ago

Thanks Mike & Greg for clarifying this!

One more clarification - I think Greg meant fragment counts and not FPKM (as FPKM would mean fragment count normalized by total number of reads and length of covered region -- I don't think the score in the junctions-high-quality.bed files involve any normalization).

greggrant commented 11 years ago

Right, the raw counts are fragment counts.

On Wed, 17 Apr 2013, nmanik wrote:

Thanks Mike & Greg for clarifying this!

One more clarification - I think Greg meant fragment counts and not FPKM (as FPKM would mean fragment count normalized by total number of reads and length of covered region -- I don't think the score in the junctions-high-quality.bed files involve any normalization).


Reply to this email directly or view it on GitHub: https://github.com/PGFI/rum/issues/170#issuecomment-16535334