vastgroup / vast-tools

A toolset for profiling alternative splicing events in RNA-Seq data.
MIT License
78 stars 29 forks source link

question about usage of --min_range #59

Closed ZheFrench closed 6 years ago

ZheFrench commented 7 years ago

Hi folks,

Don't understand well how to use this parameter. From doc : --min_range option, to provide higher (positive values) or lower (negative values) stringency. --min_range is the minimum value any individual pair's ΔPSI can be.

So you can set up negative value. For example , min_range =-1, and so you expecting to have no filtering on PSI for replicate.

For paired comparison, setting min_range=5 and min_dPSI=20 means you filter out dpsi < 20 and psi should be > 5 at least in each replicates of your pairs ?

Thanks

mirimia commented 7 years ago

Hey,

Summarized in this way, it certainly looks confusing. I will try to improve the explanation in the README. Here it's the idea:

By using positive --min_range values, you make sure that the PSI distributions of group A and B do NOT overalap. On the contrary, use of negative --min_range allows PSI distributions to overlap. So, if, instead, group A PSI_A1=10 and PSI_A2=30, and for group B PSI_B1=20 and PSI_B2=50, the "range difference" here would 20-30 = -10.

I hope it is clearer now.

ZheFrench commented 7 years ago

For PAIRED comparisons, if I say it differently, min_range is the difference at the level of each paired replicate value whereas min_diff is the difference using the average of the replicates. So I can keep my threshold with min_diff but be less restrictive with a lower min_range for example. Your change in the doc is cleaner by the way...:)

It's better but still get confuse with positive/negative values for min range. I don't understand the concept of overlapping distribution.

_So, if, instead, group A PSI_A1=10 and PSI_A2=30, and for group B PSI_B1=20 and PSIB2=50, the "range difference" here would 20-30 = -10.

Ok so for positive it would be 30 - 20 = 10 (max in A, min in B). What's the goal /idea behind this ?

mirimia commented 7 years ago

I'm not sure I understand. Do you mean of allowing overlapping distributions? Sometimes if you have many replicates it may be wise to do it. Although if you have many replicates I'd use "regular" stats...

ZheFrench commented 7 years ago

I have 3 replicates in each condition... Humm I do not understand the sign of min_range. I think i understand that is the minimum paired dPSI Test Control Rep1 5 10 Rep2 5 7 Rep3 10 7

Here Rep2 |5-7|=2 ...if you set minrange= 1, you will filter out this event... What happen if you set -1 ?

mirimia commented 7 years ago

Here you will have (for paired analysis):

dPSI_1 = 5-10 = -5 dPSI_2 = 5-7 = -2 dPSI_3 = 10-7 = 3

av_dPSI= (-5+-2+3)/3= -1.333

So, if you had --min_dPSI 1 --min_range 1 it won't make it, because the individual dPSIs are of different signs...

you'd need something like --min_range -3 or so. But it's obviously a case that should be filtered out! :-)