vastgroup / vast-tools

A toolset for profiling alternative splicing events in RNA-Seq data.
MIT License
77 stars 28 forks source link

to_combine: zero inclusion/exclusion reads error #71

Closed araposo closed 5 years ago

araposo commented 6 years ago

I have run Vast-tools 1.3.0 for 15 data sets in Mmu, with previous VastDB files and had no problems. I have now repeated the analysis with 2.0.2 and the updated annotation. (vastdb.Mmu.16.02.18.tar.gz), and I can't get it to finish the score table. I appears it's because one of the files has no IR reads (the species is the same for all of the other files, running sbatch on SLURM machine):

Merging IR of 15 sample(s)... [vast combine]: Building quality score table for intron retention (version 2) [vast combine]: Building quality score table for intron retention (version 2) [vast combine]: Building quality score table for intron retention (version 2) sample_1-E15.SE1 [vast combine]: Building Table for intron retention (version 2) [vast combine]: Building quality score table for intron retention (version 2) Trying to load required package: optparse Merging IR of 15 sample(s)... [vast combine]: Building quality score table for intron retention (version 2) [vast combine]: Building quality score table for intron retention (version 2) [vast combine]: Building quality score table for intron retention (version 2) [vast combine]: Building quality score table for intron retention (version 2) [vast combine]: Building quality score table for intron retention (version 2) Error: Looks like an error: all IR events of sample sample_1-E15.SE1 have zero inclusion and zero exclusion reads. Did you use the correct species? Execution halted [vast combine error]: /root/software/vast-tools/bin/RI_MakeTablePIR.R --verbose 1 -s /genomes/mm9/VastTools2/Mmu --IR_version 2 -c /vast_out/to_combine -q vast_out/to_combine/Coverage_key_v2-Mmu15.IRQ -o /vast_out/raw_incl Failed in RunDBS_2.pl! at /root/software/vast-tools/bin/RunDBS_2.pl line 53.

mirimia commented 6 years ago

Hello,

I was also getting this for some samples as well. @UBrau (and perhaps @aghr), could you please look at that?

Thanks Manu

UBrau commented 6 years ago

Hi @araposo, Can you make the output files from align that you were using available (e.g., Dropbox link)? Mailto: u.braunschweig@utoronto.ca

mirimia commented 6 years ago

Could you please also paste the output of vast-tools align? I.e. whether it gave any errors in the mapping steps.

araposo commented 6 years ago

Hi, sorry about the delay. i'm giving you the slurm-outs during vast-align, seems to be happening at gzip step: slurm-590340_1E15.out.txt slurm-590341_5P0.out.txt slurm-590343_1P0.out.txt slurm-590344_1P5.out.txt

this then produces files for to_combine with 000/ne values, which break the pipe. i'll try to upload these too.

araposo commented 6 years ago

examples:

sortedNsorted_5-P0.PP.SE1.IR.summary_v2.txt sortedNsorted_1-P0.PP.SE1.IR.summary_v2.txt sortedNsorted_5-P0.PP.SE1.IR.summary_v2.txt sortedNsorted_1-E15.PP.SE1.IR.summary_v2.txt sortedNsorted_1-P5.PP.SE1.IR.summary_v2.txt

araposo commented 6 years ago

(sorry)

mirimia commented 6 years ago

Have you tried re-running the samples? It seems a vast-tools-independent problem, somehow.

araposo commented 6 years ago

i've run align 2 times so far. unfortunately, didn't take note at first time which files were problematic. but, as i said, the earlier vast-tools version didn't give me any trouble. running it again.

mirimia commented 6 years ago

True. That's important. Although from the log files I see the reads are not strand specific, so v2 should be identical to v1 here. Strange.

When you run it in v1, you also started from uncompressed reads, right?

araposo commented 6 years ago

they're the same files: uncompressed, paired-end, not strand-specific. testing machine here also.

UBrau commented 6 years ago

@araposo, could you provide us with sample data for which you see the error?

araposo commented 6 years ago

gladly, but the fastq files are 10GB each x2 (paired-end)... in any case, results are not consistent: on this re-run, i don't even get the raw_incl/reads folders. another difference, for one of the files ("second" for re-run): first_1-E15.PP.SE1.IR2.txt second_1-E15.PP.SE1.IR2.txt

slurm-first_1E15.out.txt

slurm-second_1E15.out.txt

UBrau commented 6 years ago

I'm not sure what these files are... they seem to be logs from running two different input files (P5 and E15, see 2nd line). The 'unexpected end of file' warnings make me think that there could be an issue on your system that caused writing to these files to be interrupted. Was your disk full? Could a connection have been broken when copying files?