gpertea / stringtie

Transcript assembly and quantification for RNA-Seq
MIT License
361 stars 76 forks source link

Segmentation fault 11 #389

Open theokirkland opened 1 year ago

theokirkland commented 1 year ago

I am trying to assemble four relatively small bam files and getting a segmentation fault: 11 error. I can assemble two of them but the other two give this error. The last few lines of the -v output before the fault is

[02/07 15:54:55]>bundle GG704912:61240-61389 [6 alignments (4 distinct), 0 junctions, 0 guides] begins processing... [02/07 15:54:55]^bundle GG704912:61240-61389 done (0 processed potential transcripts). [02/07 15:55:10]>bundle GG704912:61822-4648378 [8800860 alignments (1931765 distinct), 9640 junctions, 0 guides] begins processing... Segmentation fault: 11

I am using a Mac M1 64 GB with Ventura 13.1 - I have run stringtie before and not had this problem.

I see from others that they could find the problem in the bam file and eliminate it but I don't know how to do that.

Thanks

Theo Kirkland UC San Diego tkirkland@ucsd.edu

gpertea commented 1 year ago

That bundle does seem quite large and junction rich, though Stringtie should be able to handle it in 64GB I think.. Anyway, I'm now quite curious to see what's going on there.

Is it possible to extract that bundle and send it to me for debugging ? See instructions here: https://github.com/gpertea/stringtie/wiki/Extracting-bundle-data-for-debugging

It would be useful to also check if you can reproduce the crash on your M1 by running stringtie just on that bundle separately after you extracted it.

theokirkland commented 1 year ago

Attachments available until Mar 10, 2023 I really appreciate your quick response.

This is the fault line [02/08 09:51:49]>bundle GG704912:61822-4648378 [8800860 alignments (1931765 distinct), 9640 junctions, 0 guides] begins processing... Segmentation fault: 11

I’ll show you the commands that I ran

@.*** ~/Desktop/RNAseq_protocol/RS_output_2023 $ samtools index Sph_48_A.bam

@.*** ~/Desktop/RNAseq_protocol/RS_output_2023 $ samtools index Sph_48_B.bam

@.*** ~/Desktop/RNAseq_protocol/RS_output_2023 $ samtools view -b Sph_48_A.bam GG704912:61822-4648378 > bundle_sph_A.bam

@.*** ~/Desktop/RNAseq_protocol/RS_output_2023 $ wc -l bundle_sph_A.bam 436736 bundle_sph_A.bam

@.*** ~/Desktop/RNAseq_protocol/RS_output_2023 $ samtools view -b Sph_48_B.bam GG704912:61822-4648378 > bundle_sph_B.bam

@.*** ~/Desktop/RNAseq_protocol/RS_output_2023 $ wc -l bundle_sph_B.bam 568195 bundle_sph_B.bam

@.*** ~/Desktop/RNAseq_protocol/RS_output_2023 $ stringtie -v -o trial_bundle_RS.gtf bundle_sph_A.bam bundle_sph_B.bam Running StringTie 2.1.6. Command line: stringtie -v -o trial_bundle_RS.gtf bundle_sph_A.bam bundle_sph_B.bam Default stack size for threads: 524288 (increased to 8388608) [02/08 10:00:44]>bundle GG704912:61822-4648378 [8800860 alignments (1931765 distinct), 9640 junctions, 0 guides] begins processing... Segmentation fault: 11

So the same bundle failed again. I don’t know exactly what a bundle is though. Can I just eliminate a bundle from the bam files? Do I need to eliminate it from both A and B.

Thanks again,

Theo

Click to Download https://www.icloud.com/attachment/?u=https%3A%2F%2Fcvws.icloud-content.com%2FB%2FAVbFLe3gOfNtneOCoGFgiwqCYrTRATX1IrIKgW9ZaFhW5j3YJNOxhJR7%2F%24%7Bf%7D%3Fo%3DAs45lRSz0cu5lQ58qzhzEDqxa-qH15NR2WVZvUus3HZS%26v%3D1%26x%3D3%26a%3DCAogluLYsDVwljgCXqvw8R1yXxNpxGToquNVanYWqbr9bJwSchCk_ueR4zAYpI7j5ewwIgEAKgkC6AMA_3FHLOxSBIJitNFaBLGElHtqI8fhciEDXWXfQ5o6ESuZgBfUDI1haROB6roCZ0Qx6CVV9u3EciOzKjo4YSZzPk0E8zQYCyWM5Nb5EeC0t0jhHRyeBQzhRFPKmQ%26e%3D1678471907%26fl%3D%26r%3D746985A7-507E-469E-8909-F91286577B3E-1%26k%3D%24%7Buk%7D%26ckc%3Dcom.apple.largeattachment%26ckz%3DA1BA7338-7AAB-4F26-ACBD-F7A8DCEE66AF%26p%3D50%26s%3DbCy8i8-sp1RUemmvo2laWipnvzY&uk=yteYQFVUUTGv7hAuk4eHVQ&f=bundle_sph_A.bam&sz=118986166bundle_sph_A.bam 119 MB Click to Download https://www.icloud.com/attachment/?u=https%3A%2F%2Fcvws.icloud-content.com%2FB%2FAcbfv1OeFEHHBI3ONZ2DTwOLHdYZAbjxdaqakbifdC_cObsdy_mLMf2x%2F%24%7Bf%7D%3Fo%3DAqVCVLSShjoqKOqV8AmPoHObBjo1cB9e0fBH5pZTc1cg%26v%3D1%26x%3D3%26a%3DCAogEo_DwbkH1Q-Sr8sJ4WzQ67DpnnCk2rDsvXW3TmMiSnwSchC0_-eR4zAYtI_j5ewwIgEAKgkC6AMA_3n715ZSBIsd1hlaBIsx_bFqI_g4W9e7WrB-EFlpYTc7YVdx_CnvABd6wg8Z7ADdOD0AnoMnciPUFr9HnhBl3wdNpbdRAnDxw2tfqLt3p6Q0ziACJlmlRRoN9A%26e%3D1678471907%26fl%3D%26r%3D147A2590-A77F-4BE2-AA2D-19A9464C3034-1%26k%3D%24%7Buk%7D%26ckc%3Dcom.apple.largeattachment%26ckz%3DA1BA7338-7AAB-4F26-ACBD-F7A8DCEE66AF%26p%3D50%26s%3DOtBeuXbKtgIcTmcJFAHCzR4zRUM&uk=n7j8e02Tyhc4QjL1xUZ-GA&f=bundle_sph_B.bam&sz=153429074bundle_sph_B.bam 153.4 MB

On Feb 8, 2023, at 5:11 AM, Geo Pertea @.***> wrote:

That bundle does seem quite large and junction rich, though Stringtie should be able to handle it in 64GB I think.. Anyway, I'm now quite curious to see what's going on there.

Is it possible to extract that bundle and send it to me for debugging ? See instructions here: https://github.com/gpertea/stringtie/wiki/Extracting-bundle-data-for-debugging https://urldefense.com/v3/__https://github.com/gpertea/stringtie/wiki/Extracting-bundle-data-for-debugging__;!!Mih3wA!A4hn64EN_xiMJFCbLE9XE2ZbAoPnea0vpv4qip04mkOzPCBWRYVCkp4gqAETCUJR1bnh1AYcJzam0qFC4jrf5pLy5g$ It would be useful to also check if you can reproduce the crash on your M1 by running stringtie just on that bundle separately after you extracted it.

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/gpertea/stringtie/issues/389*issuecomment-1422578748__;Iw!!Mih3wA!A4hn64EN_xiMJFCbLE9XE2ZbAoPnea0vpv4qip04mkOzPCBWRYVCkp4gqAETCUJR1bnh1AYcJzam0qFC4joY9HfLLg$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AGHKSXSWTAEF46TGYNMKKITWWOLRNANCNFSM6AAAAAAUUTBURM__;!!Mih3wA!A4hn64EN_xiMJFCbLE9XE2ZbAoPnea0vpv4qip04mkOzPCBWRYVCkp4gqAETCUJR1bnh1AYcJzam0qFC4jrAwxUIOA$. You are receiving this because you authored the thread.

gpertea commented 1 year ago

thank you, got the files and was able to reproduce the problem (it crashes the same even on Linux). Noticed that assembling the two bundles separately works fine, only assembling them both at once triggers a crash, I'm going to look into that.

A "bundle" is a cluster of overlapping read alignments in a genomic locus. I would not recommend discarding bundles unless there is a good reason to ignore reads aligned there which can reflect potentially valuable transcript information.

Instead I would rather suggest the assembly each BAM file separately and merge the results. What is the reason you are feeding two BAM files at the same time to StringTie ? Are those two separate sequencing samples?

I also noticed you are using version 2.1.6 instead of the latest v2.2.x version. Is there a particular technical reason for using that older version? I am planning to apply the fix just to the current version.

theokirkland commented 1 year ago

To be sure that I understand you - you would run string tie merge on multiple bam files - not stringtie. Doesn’t stringtie merge require gif files? Do you run stringtie on a single bam file to get a gif file. My ultimate goal is to get a gtf file that has all the alignments from a total of 6 independent sequencing data sets. What is the best way to do that?

Sorry for the naive questions. I’m clearly at the edge of my competence and appreciate your help.

I thought that stringtie 2.1.6 was the most updated version. I’ll update it.

Thanks again,

Theo

On Feb 9, 2023, at 8:05 AM, Geo Pertea @.***> wrote:

thank you, got the files and was able to reproduce the problem (it crashes the same even on Linux). Noticed that assembling the two bundles separately works fine, only assembling them both at once triggers a crash, I'm going to look into that.

A "bundle" is a cluster of overlapping read alignments in a genomic locus. I would not recommend discarding bundles unless there is a good reason to ignore reads aligned there which can reflect potentially valuable transcript information.

Instead I would rather suggest the assembly each BAM file separately and merge the results. What is the reason you are feeding two BAM files at the same time to StringTie ? Are those two separate sequencing samples?

I also noticed you are using version 2.1.6 instead of the latest v2.2.x version. Is there a particular technical reason for using that older version? I am planning to apply the fix just to the current version.

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/gpertea/stringtie/issues/389*issuecomment-1424436343__;Iw!!Mih3wA!CSF9SHr0xJREJ16Ld0rd-YmdYW50FYfT2aOiljBDazHUiBkD2Hz_N90S9VkGktwzToG5xrI8FVMrqd1habJWHmmIiw$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AGHKSXTREATYZDLFRHQVFETWWUITPANCNFSM6AAAAAAUUTBURM__;!!Mih3wA!CSF9SHr0xJREJ16Ld0rd-YmdYW50FYfT2aOiljBDazHUiBkD2Hz_N90S9VkGktwzToG5xrI8FVMrqd1habKIj-9-0Q$. You are receiving this because you authored the thread.

gpertea commented 1 year ago

Yes, see the protocol outlined at https://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#de

Yes, stringtie --merge takes as input the .gtf files which are the output of the assembly of individual sample (.bam) files. Alternatively, one can also use gffcompare for a much more conservative merging of those gtfs.

theokirkland commented 1 year ago

I really appreciate your help and detailed step by step instructions, which are exceptional. I have used your workflow previously to analyze NGS data, including the publications, but temporarily forgot the lessons that I learned. Thanks for the reminder.

Thanks again,

Theo

On Feb 9, 2023, at 1:48 PM, Geo Pertea @.***> wrote:

https://ccb.jhu.edu/software/stringtie/index.shtml?t=manual#de https://urldefense.com/v3/__https://ccb.jhu.edu/software/stringtie/index.shtml?t=manual*de__;Iw!!Mih3wA!ARBTcwE71TwoTYXxoxdZqEH_zlIrIQllSpDqLnkxlTO34u9sgVp-1g97jqh2KoPjgk-Cqc4yC0tGIPdT3owbd-SrWg$