gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
378 stars 78 forks source link

Ballgown stoping writing to the tmp, but it does not crash #275

Open GabyBG opened 4 years ago

GabyBG commented 4 years ago

Hello

I am using StringTie v2.1.3b and I have done the assembly and merge steps. I was interested in creating the tables for ballgown, here is my command for reference:

stringtie -L -p 32 -eB -G file_stringtieMerge.out.gtf -o filename.out originalAlig.sam

I was able to generate an abundance table following the instructions using the mergedGTF, but for some reason when I want to create the tables for ballgown, it stays running forever, but I noticed it stops writing to the tmp file in less than 2 min, I have ran it overnight and still nothing.

Is there any incompatibility with the options I am using?

thank you so much for your help!!!more

gpertea commented 4 years ago
  1. Are you saying that if you do not use -B then it finishes properly (with only -e instead of -eB) ?
  2. Could you please run your command like above (with -eB) but without -p option and adding the -v option, to see if it the program gets stuck at a specific bundle? If you can share the data causing this problem, please try to follow the instructions here: https://github.com/gpertea/stringtie/wiki/Extracting-bundle-data-for-debugging It would be of great help for debugging this issue if you can extract a specific bundle following those instructions and reproduce the exact same problem on just one bundle -- then share just that bundle with me. If the problem cannot be reproduce on a specific bundle maybe you can compress and share these whole 2 files: file_stringtieMerge.out.gtf and originalAlig.sam.
GabyBG commented 4 years ago

Hello,

I have tried to run the script the way you suggest it, but it's not moving since 9:50 ( I just checked), it does not crash, it just stays in that state:

Running StringTie 2.1.3b. Command line: /data/users/gbalderr/stringtie/./stringtie -L -v -eB -G /share/crsp/lab/seyedam/share/TALON_paper_data/revisions_1-20/stringtie/PB125_PB126_FLNC_stringtieMerge.out -o PB125_abundanceMerged_stringtieBallgown.out.gtf /share/crsp/lab/seyedam/share/PACBIO/PB125/Minimap2/C01/mapped_FLNC_XSflag_sorted.bam [05/22 09:50:43] Loading reference annotation (guides).. [05/22 09:50:48] 224976 reference transcripts loaded. Default stack size for threads: 2097152 (increased to 8388608) [05/22 09:50:48]>bundle chr1:586071-859446 [63 alignments (63 distinct), 42 junctions, 75 guides] begins processing... [05/22 09:50:48]^bundle chr1:586071-859446 done (8 processed potential transcripts). [05/22 09:50:48]>bundle chr1:923928-959309 [268 alignments (218 distinct), 53 junctions, 26 guides] begins processing... [05/22 09:50:48]^bundle chr1:923928-959309 done (0 processed potential transcripts). [05/22 09:50:48]>bundle chr1:960587-965718 [4 alignments (4 distinct), 12 junctions, 5 guides] begins processing... [05/22 09:50:48]^bundle chr1:960587-965718 done (0 processed potential transcripts). [05/22 09:50:48]>bundle chr1:1001138-1014544 [294 alignments (102 distinct), 7 junctions, 6 guides] begins processing... [05/22 09:50:48]^bundle chr1:1001138-1014544 done (1 processed potential transcripts). [05/22 09:50:48]>bundle chr1:1059049-1069355 [5 alignments (5 distinct), 8 junctions, 7 guides] begins processing... [05/22 09:50:48]^bundle chr1:1059049-1069355 done (2 processed potential transcripts). [05/22 09:50:48]>bundle chr1:1081818-1116361 [7 alignments (7 distinct), 20 junctions, 21 guides] begins processing... [05/22 09:50:48]^bundle chr1:1081818-1116361 done (1 processed potential transcripts). [05/22 09:50:48]>bundle chr1:1203504-1206814 [6 alignments (6 distinct), 7 junctions, 5 guides] begins processing... [05/22 09:50:48]^bundle chr1:1203504-1206814 done (1 processed potential transcripts). [05/22 09:50:48]>bundle chr1:1216908-1232046 [4 alignments (4 distinct), 7 junctions, 8 guides] begins processing... [05/22 09:50:48]^bundle chr1:1216908-1232046 done (1 processed potential transcripts). [05/22 09:50:48]>bundle chr1:1249777-1273885 [25 alignments (25 distinct), 14 junctions, 22 guides] begins processing... [05/22 09:50:48]^bundle chr1:1249777-1273885 done (1 processed potential transcripts). [05/22 09:50:48]>bundle chr1:1292376-1324707 [324 alignments (196 distinct), 68 junctions, 74 guides] begins processing... [05/22 09:50:48]^bundle chr1:1292376-1324707 done (2 processed potential transcripts). [05/22 09:50:48]>bundle chr1:1373711-1375537 [114 alignments (84 distinct), 5 junctions, 7 guides] begins processing... [05/22 09:50:48]^bundle chr1:1373711-1375537 done (1 processed potential transcripts). [05/22 09:50:48]>bundle chr1:1376753-1407313 [329 alignments (256 distinct), 54 junctions, 40 guides] begins processing... [05/22 09:50:48]^bundle chr1:1376753-1407313 done (10 processed potential transcripts). [05/22 09:50:48]>bundle chr1:1434861-1442882 [1 alignments (1 distinct), 0 junctions, 4 guides] begins processing... [05/22 09:50:48]^bundle chr1:1434861-1442882 done (0 processed potential transcripts). [05/22 09:50:48]>bundle chr1:1449689-1497848 [13 alignments (13 distinct), 20 junctions, 8 guides] begins processing... [05/22 09:50:48]^bundle chr1:1449689-1497848 done (1 processed potential transcripts). [05/22 09:50:48]>bundle chr1:1503250-1534687 [12 alignments (12 distinct), 24 junctions, 8 guides] begins processing... [05/22 09:50:48]^bundle chr1:1503250-1534687 done (2 processed potential transcripts). [05/22 09:50:48]>bundle chr1:1541673-1577075 [88 alignments (61 distinct), 6 junctions, 7 guides] begins processing... [05/22 09:50:48]^bundle chr1:1541673-1577075 done (1 processed potential transcripts). [05/22 09:50:48]>bundle chr1:1632095-1746293 [59 alignments (59 distinct), 57 junctions, 68 guides] begins processing... [05/22 09:50:48]^bundle chr1:1632095-1746293 done (2 processed potential transcripts). [05/22 09:50:48]>bundle chr1:1751232-1780527 [72 alignments (70 distinct), 33 junctions, 21 guides] begins processing... [05/22 09:50:48]^bundle chr1:1751232-1780527 done (4 processed potential transcripts). [05/22 09:50:48]>bundle chr1:1785285-1891120 [160 alignments (149 distinct), 37 junctions, 14 guides] begins processing... [05/22 09:50:48]^bundle chr1:1785285-1891120 done (2 processed potential transcripts). [05/22 09:50:48]>bundle chr1:2321253-2413797 [349 alignments (311 distinct), 37 junctions, 37 guides] begins processing...

I will try to share the files to help debugging, as well as my scripts, thank you for your quick response.

gpertea commented 4 years ago

So it seems to get stuck at that bundle chr1:2321253-2413797. I do not need any other data or scripts, but just the alignments from mapped_FLNC_XSflag_sorted.bam and the "guides" (transcripts) from PB125_PB126_FLNC_stringtieMerge.out, which I assume is a GTF --that are falling into that region of chr1. If you follow the guide that I linked above you should be able o get a BAM and gff files just for that region -- it should be easier to just share those 2 files.

kokyriakidis commented 4 years ago

I experience the same! It is stuck at

[06/28 15:03:53]^bundle chr2:217730-278283 done (0 processed potential transcripts).
[06/28 15:03:53]>bundle chr2:950868-1367615 [2 alignments (1 distinct), 2 junctions, 15 guides] begins processing...
[06/28 15:03:53]^bundle chr2:950868-1367615 done (0 processed potential transcripts).
[06/28 15:03:53]>bundle chr2:3379675-3485094 [2 alignments (1 distinct), 11 junctions, 26 guides] begins processing...

but it doesn't crush. I had it run a whole day but still nothing

gpertea commented 4 years ago

Maybe I missed an email from @GabyBG but I do not remember any followup with the data to debug that bundle. Maybe you @kokyriakidis can extract the data for that bundle chr2:3379675-3485094, make sure you still get the same behavior when you run just on that smaller bam and gff file and then send those files to me for debugging?

kokyriakidis commented 4 years ago

@gpertea I have a problem with region chr2:74455023-74460891 in one of my Nanopore samples. I sent you the BAM and the GFF file of these regions.

Archive.zip

The commands for sampleA are:

module purge && \
module load mugqic_dev/stringtie/2.1.3b && \
mkdir -p stringtie2/sampleA && \
stringtie -v -L \
  -G stringtie2/AllSamples/merged.gtf\
  -eB -A stringtie2/sampleA/abundance.tab \
  -p 2 \
  -m 200 \
  -o stringtie2/sampleA/transcripts.gtf \
  alignment/sampleA/sampleA.sorted.bam
gpertea commented 4 years ago

Is it the same problem as described above? What was your stringtie command line?

kokyriakidis commented 4 years ago

I face the same problem with another sample in the region: chr1:154155304-154194648. Lets call it sampleB.

SampleB.zip

The problem is that the tool starts to run for 1-2 seconds and then freezes in these regions. I rerun the analysis many times but I get the same behavior.

The command for sampleB is:

module purge && \
module load mugqic_dev/stringtie/2.1.3b && \
mkdir -p stringtie2/sampleB && \
stringtie -v -L \
  -G stringtie2/AllSamples/merged.gtf\
  -eB -A stringtie2/sampleB/abundance.tab \
  -p 2 \
  -m 200 \
  -o stringtie2/sampleB/transcripts.gtf \
  alignment/sampleB/sampleB.sorted.bam
gpertea commented 4 years ago

I cannot reproduce the problem with the data from the zip files you sent. Maybe it's a larger issue and the problem only appears when those bundles are in a specific context.. I had suggested you first check to make sure you can reproduce the problem on the bundles you extracted and only then send it to me for debugging, if the problem was also manifest when run only on those bundles.

E.g. using the data from the Archive.zip file, for me stringtie ran like this and finished quickly:

../stringtie -v -L -G bundle_c2.gff -eB -A b2/abundance.tab -m 200 -o b2/out.gtf bundle_c2.bam 
Running StringTie 2.1.3b. Command line:
../stringtie -v -L -G bundle_c2.gff -eB -A b2/abundance.tab -m 200 -o b2/out.gtf bundle_c2.bam
[06/29 11:32:17] Loading reference annotation (guides)..
[06/29 11:32:17] 3 reference transcripts loaded.
Default stack size for threads: 8388608
[06/29 11:32:17]>bundle chr2:74455421-74457793 [3 alignments (3 distinct), 4 junctions, 1 guides] begins processing...
[06/29 11:32:17]^bundle chr2:74455421-74457793 done (1 processed potential transcripts).
[06/29 11:32:17]>bundle chr2:74458469-74460750 [7 alignments (7 distinct), 9 junctions, 2 guides] begins processing...
[06/29 11:32:17]^bundle chr2:74458469-74460750 done (2 processed potential transcripts).
[06/29 11:32:17] 10 aligned fragments found.
[06/29 11:32:17] All threads finished.
Total count of aligned fragments: 10
Fragment coverage length: 916.4

So what happens on your side when you run stringtie like this with just the bundle_c2 data files you sent me? Or just the bundle_c1 data? For me in both cases it runs without problems.

Also, just making sure, when you determined the bundles to extract, you did not run stringtie with -p 2 option as you showed above, right?

gpertea commented 4 years ago

If the problem only manifests when stringtie is run in the larger context of other bundles etc. could you please share (upload on Google Drive perhaps) a larger part of your data file?

For example, for sampleB, maybe you can extract all the chr1 data from both sampleB.sorted.bam and merged.gtf and if you can reproduce the issue on that chr1 subset, then hopefully you can share those chr1 data with me so I can start debugging this issue.

kokyriakidis commented 4 years ago

@gpertea Thanks for your help!

After several reruns, the problem occurs when the transcript files derive from the merged file. When they derive from the sampleA_transcript.gtf file everything works fine. The gtf files I sent you above where from the transcripts file and not from the merged file and that's why they work fine.

I am sending you: 1) The 2 test BAM files (specific regions that cause the freeze) of my 2 samples (sampleAtestregions.bam and sampleBtestregions.bam) 2) The 2 test GTF files derived from the merged.gtf file (sampleA_from_merged.gtf and sampleB_from_merged.gtf) 3) The 2 test GTF files derived from the transcripts.gtf files of each sample (sampleA_from_transcripts.gtf and sampleB_from_transcripts.gtf) 4) The merged GTF file fo the 2 samples (merged.gtf) 5) The original BAM files of my samples (sampleA.sorted.bam and sampleB.sorted.bam) 6) The original transcript files of my samples (sampleA_transcripts.gtf and sampleB_transcripts.gtf)

The affected regions are:

sampleA ==> chr2:74455023-74460891
sampleB ==> chr1:154155304-154194648

Please rerun the analysis with the 2 GTF files that derived from the merged.gtf file and you should get a freeze.

The command I used is:

module purge && \
module load mugqic_dev/stringtie/2.1.3b && \
mkdir -p stringtie2/sampleB && \
stringtie -v -L \
  -G stringtie2/AllSamples/sampleB_from_merged.gtf\
  -eB -A stringtie2/sampleB/abundance.tab \
  -m 200 \
  -o stringtie2/sampleB/transcripts.gtf \
  alignment/sampleB/sampleBtestregions.bam

Thanks again for your help!

https://drive.google.com/drive/folders/1ExHtBLrfwjYq9jZ1APNGaexZyyXleiNg?usp=sharing

gpertea commented 4 years ago

Yes, thank you for providing the data, now I was able to reproduce the stalling bug here. Looking into it.

kokyriakidis commented 4 years ago

@gpertea

When I run the command with "-p 2" it gets stuck in a certain region. When I increased -p to 24, one sample finishes succesfully and the other is stuck again. When I removed -p both samples stuck in these regions I have written above.

I don't know if this help somehow. It was really wierd.

gpertea commented 4 years ago

v2.1.4 has been released and it includes the fix for this issue - thank you again for providing the data exposing that bug (it was an infinite loop that happened under certain conditions).

Please download the latest version from here: https://github.com/gpertea/stringtie/releases/tag/v2.1.4

kokyriakidis commented 4 years ago

@gpertea Thank you very much for your support! I can confirm that it now works!