vastgroup / vast-tools

A toolset for profiling alternative splicing events in RNA-Seq data.
MIT License
77 stars 28 forks source link

Output of vast-tools diff #69

Closed BaluPai closed 6 years ago

BaluPai commented 6 years ago

Hi,

I have been trying the vast-tools recently (thanks for this cool AS package). Running the 'diff' did not give me the format as described in the Manual. The run was fine and it did give me a .DIFF_sig.txt and a .PSI_plots.pdf. But the format of the table (.txt) file did not have the dPSI or the MV column. I tried to use the Inclusion level.tab file two ways : one by running the merge==>combine==>diff or combine (all samples in the to_combine folder) and then run diff on the Inclusion_levels.tab.

Also the number of Sig outputs was different for both the runs, I guess this should be because of the statistics applied after the merge function being different.

./vast-tools diff -a sam1,sam2,sam3,2sam4 -b sam1,sam2,sam3,2sam4 -i INCLUSION_LEVELS_FULL-Hsa12-hg38_all.tab -S 2 -m 0.5 -c 16.

Please let me know if this is okay.

Best regards, Balu

mirimia commented 6 years ago

Hi Balu,

Thanks for your email.

@timbitz : could you please answer the format question? Perhaps it's a documentation problem? Nothing has changed in a quite a while in diff.

@BaluPai : I'm not sure I understand how you have used merge, but I'm sure it would have some effect in diff. Also, just to let you know that we are planning to release a newer implementation of a diff-like module sometime soon.

Cheers Manu

BaluPai commented 6 years ago

Hi Manu,

I tried the merge function since I have two replicates for two types of treatment and two types of cells. So in the config file for merge I had something like this : P1_plus P_plus P2_plus P_plus M1_plus M_plus M2_plus M_plus P1_min P_min P2_min P_min M1_min M_min M2_min M_min And hence I could merge the replicates and then use the diff. I am also interested to compare the two cell types PvsM, irrespective of the treatments (as used in the 'diff' function, previous post)

Best, Balu

timbitz commented 6 years ago

Hi @mirimia, I don't think this is a documentation problem, I found he line in diff where text is stored to print: sprintf("%s\t%s\t%f\t%f\t%f\t%s", tabLine[1], tabLine[2], medOne, medTwo, medOne - medTwo, round(max,2)) It doesn't make sense to me that psiA and psiB would print but not dPsi? Similarly, the max has to be a number or nothing would print at all...

@BaluPai Can you send me the diff output, or the head of the .diff.txt file? Or even the head of the inclusion levels file? Just anything to help me reproduce this? I can't imagine how those columns wouldn't be there, and no error would be thrown?

In terms of using merge, I have no idea, was that even around when diff was written, I don't remember?

timbitz commented 6 years ago

@BaluPai Specifically, It would be great if you can come up with real snippets of the input files that produce the erroneous outputs, and the exact command to produce those outputs. Thanks!

BaluPai commented 6 years ago

Hi, This is how the diff_out looked like and so is the Inclusion_levels.tab. Also I had tried the compare function with the --print option I could add the dPSI column out the output.

GENE EVENT COORD LENGTH FullCO COMPLEX M_min M_min-Q M_plus M_plus-Q P_min P_min-Q P_plus P_plus-Q
MYO9A HsaEX0041281 chr15:71892965-71893177 213 chr15:71893679,71892965-71893177+71893180,71888116 S 25.93 OK,OK,OK,OK,S@8.82,25.18 10.2 OK,OK,OK,Bn,S@2.75,24.25 90.29 OK,OK,OK,OK,S@88.48,9.52 96.26 SOK,SOK,SOK,OK,S@101.07,3.93
NEO1 HsaEX0042588 chr15:73293454-73293548 95 chr15:73289238,73293390+73293454-73293548,73298348 S 92.23 SOK,SOK,SOK,OK,S@250.87,21.13 97.32 SOK,SOK,SOK,B1,S@574.19,15.81 28.09 SOK,SOK,SOK,OK,S@112.08,286.92 33.54 SOK,SOK,SOK,OK,S@144.22,285.78
SORBS1 HsaEX0060966 chr10:95434622-95434717 96 chr10:95437490,95434622-95434717,95432576 S 1.35 OK,OK,OK,OK,S@1.01,73.99 3.89 SOK,SOK,SOK,OK,S@5.72,141.28 79.37 SOK,SOK,SOK,OK,S@89.69,23.31 81.35 SOK,SOK,SOK,OK,S@142.36,32.64
GPR126 HsaEX0028282 chr6:142440908-142440953 46 chr6:142438364,142440908-142440953,142443337+142443341 S 100 LOW,LOW,LOW,OK,S@31.00,0.00 100 N,N,N,B2,S@28.00,0.00 20.16 OK,OK,OK,OK,S@11.89,47.11 21.63 OK,OK,OK,OK,S@11.46,41.54
mirimia commented 6 years ago

The merge config looks OK. As I said, I think the differences in diff with or w/o merge are expected

timbitz commented 6 years ago

@BaluPai , Thanks for sending me the snippet of your inclusion file... I pasted it in a file tmp.tab but it runs perfectly for me so I don't know how to reproduce your erroneous output...

$ ./vast-tools diff -a M_min -b P_min -i tmp.tab -o . GENE EVENT M_min P_min E[dPsi] MV[dPsi]_at_0.95 MYO9A HsaEX0041281 0.277704 0.898755 -0.621051 0.46 NEO1 HsaEX0042588 0.921640 0.280900 0.640740 0.58 SORBS1 HsaEX0060966 0.022816 0.792419 -0.769603 0.69 GPR126 HsaEX0028282 0.980987 0.211042 0.769945 0.65

$ ./vast-tools diff -a M_min,M_plus -b P_min,P_plus -i tmp.tab -o . GENE EVENT M_min P_min E[dPsi] MV[dPsi]_at_0.95 MYO9A HsaEX0041281 0.193990 0.932110 -0.738120 0.52 NEO1 HsaEX0042588 0.951511 0.305562 0.645950 0.56 SORBS1 HsaEX0060966 0.031510 0.801005 -0.769495 0.68 GPR126 HsaEX0028282 0.974642 0.214396 0.760246 0.64

BaluPai commented 6 years ago

@timbitz , I tried again removing options one by one until just running the default

./vast-tools diff -a MS_min,MS_plus -b PN_min,PN_plus -i test.tab -c 16

I have no idea the output format is just like the input with a filtered set and without the M_min P_min E[dPsi] MV[dPsi]_at_0.95...

thanks

kcha commented 6 years ago

Hi @BaluPai,

Diff produces two txt files: _all.txt and _psi.txt. The _psi.txt file is the filtered output of the input as you mention. Try checking the _all.txt file.

Edit: above is an outdated response. See @timbitz response below.

timbitz commented 6 years ago

@BaluPai diff does not send this output to a file unless you PIPE it to a file. Since it is the standard output of the program, diff sends that output to STDOUT. I figured it would be pretty obvious when you ran it... but I'm thinking (based on your -c 16 flag!) that you are perhaps just dispatching this job to a cluster and then not looking back.

Did you try running that command with the snippet you sent me without -c directly from the console??

Yesterday I cloned a new version of vast-tools and ran diff on the input snippet and obtained those results... Is this not what you get when you do the exact same thing?

$ ./vast-tools diff -a M_min,M_plus -b P_min,P_plus -i tmp.tab -o . GENE EVENT M_min P_min E[dPsi] MV[dPsi]_at_0.95 MYO9A HsaEX0041281 0.193990 0.932110 -0.738120 0.52 NEO1 HsaEX0042588 0.951511 0.305562 0.645950 0.56 SORBS1 HsaEX0060966 0.031510 0.801005 -0.769495 0.68 GPR126 HsaEX0028282 0.974642 0.214396 0.760246 0.64

BaluPai commented 6 years ago

@kcha When I run diff I get one _plots.pdf (plots for the filtered output) and one _sig.txt. (filtered output) @timbitz I tried this now (and earlier with the -o flag but with -c 16) ./vast-tools diff -a M_min,M_plus -b P_min,P_plus -i /home/balupai/tools/vast-tools/vast_out/test.tab -o ./vast_out/ but am afraid it doesn't help.

timbitz commented 6 years ago

@BaluPai I think you misunderstand me. I'm saying that the output is sent to standard output. You need to redirect it into a file, if that is where you want it to be using >.

./vast-tools diff -a M_min,M_plus -b P_min,P_plus -i /home/balupai/tools/vast-tools/vast_out/test.tab -o ./vast_out/ > file_with_results.txt
kcha commented 6 years ago

@BaluPai oops, I was looking at some older diff results. You should check your standard output as @timbitz suggests.

BaluPai commented 6 years ago

@timbitz Finally....!! yes it worked. I did earlier try the way that you have suggested but by just specifying the -o file.txt or the folder but had not tried to pipe out again with '>' but just and this makes all the difference. And now I have three output files 1) filtered.txt 2) _file.Diff_Sig.txt 3) _file.Diff_plots.pdf Thanks for your effort in solving this. And @mirimia and @kcha for your responses.

timbitz commented 6 years ago

Great! Glad to hear you got it working!

To make things easier next time, you could start with the command supplied in the documentation... in this case it would have saved you considerable time and effort I think:

vast-tools diff -a sampA_r1,sampA_r2,sampA_r3 -b sampB_r1,sampB_r2 -o outputdir > outputdir/diff_output.tab
BaluPai commented 6 years ago

I just started playing around with vast tools towards the end of last week. Had been trying different options and flags and to understand how it works. I am sure I did try the way quoted in the manual but probably had some other error in combination causing it to fail. Feel sorry to have wasted your time though. Thanks again.