vastgroup / vast-tools

A toolset for profiling alternative splicing events in RNA-Seq data.
MIT License
77 stars 28 forks source link

Differences in IR events listed in output #41

Closed UBrau closed 8 years ago

UBrau commented 9 years ago

The output table contains different IR events depending on the input. Seems to be a problem with either (a) merging of events when combining IR data, or (b) merging IR with rest of pipeline.

mirimia commented 9 years ago

My understanding was that only (and all) the events in VASTDB/Hsa/TEMPLATES/Hsa.FULL.Template.txt.gz would be included the final sample...

So I guess the problem occurs when generating the pre-final "IR" table. Again, though, this is supposedly done using "Hsa.IR.Template.txt"??

UBrau commented 9 years ago

Yes, I thought so too. As it turns out though, it's not the case. I had not noticed earlier, but this is not something new (as my previous email suggested) but there were differences all along apparently. Som IR events get dropped.

I just noticed another thing that I guess should not be as it is: The full event templates ($sp.FULL.Template.txt) contain ~ 20,000 more introns than the intron templates ($sp.IR.Template.txt) for both mouse and human. It seems like the full templates never incorporated the 'latest' version of the IR templates - or did we do that to maintain compatibility with the paper?

Either way, the IR events which were dropped in one case but not another are present in both templates, and they are also still there in the INCLUSION_LEVELS_IR-Mmu10.tab file. So they must have been dropped while merging the different modules.

Ulrich

On 27/05/15 12:05, Manuel Irimia wrote:

My understanding was that only (and all) the events in VASTDB/Hsa/TEMPLATES/Hsa.FULL.Template.txt.gz would be included the final sample...

So I guess the problem occurs when generating the pre-final "IR" table. Again, though, this is supposedly done using "Hsa.IR.Template.txt"??

— Reply to this email directly or view it on GitHub https://github.com/vastgroup/vast-tools/issues/41#issuecomment-105978071.

mirimia commented 9 years ago

Fun. And these extra are printed as "NA\tN,N,N..." I guess?

mirimia commented 9 years ago

I seem to recall that we discussed two options for IR:

1) Use the events in the paper (IR.template). 2) Allow a new filtering as you did in the paper, but with whatever samples used (the "full potential set" would be those in FULL). I'm not sure this was ever implemented (but perhaps it's --stringentIR in align?).

M

UBrau commented 9 years ago

Our intention was to do (2), and that's what the IR templates are. The reason was that there are some events that would always be pulled out due to problematic overlap, and others (the ones that never passed QC) that would be difficult for users to judge if they don't have an equally broad collection of samples as we did.

It could be that the FULL templates were made before this was done and never changed.

_u

On 27/05/15 13:15, Manuel Irimia wrote:

I seem to recall that we discussed two options for IR:

1) Use the events in the paper (IR.template). 2) Allow a new filtering as you did in the paper, but with whatever samples used (the "full potential set" would be those in FULL). I'm not sure this was ever implemented (but perhaps it's --stringentIR in align?).

M

— Reply to this email directly or view it on GitHub https://github.com/vastgroup/vast-tools/issues/41#issuecomment-106001423.