wdecoster / NanoPlot

Plotting scripts for long read sequencing data
http://nanoplot.bioinf.be
MIT License
413 stars 47 forks source link

NanoPlot crash in reading a pacbio bam file #228

Closed mavino closed 2 years ago

mavino commented 3 years ago
NanoPlot -t 1 --color yellow --bam /home/mariano/Documents/Wellinger/Telomerase/TEL06R/Sequel.RunS140_S2.004.BC1299_Forward.TEL06R.ccs.bam --downsample 10000 -o /home/mariano/Documents/Wellinger/Telomerase/TEL06R/Sequel.RunS140_S2.004.BC1299_Forward.TEL06R.ccs.bam.bamplots_downsampled

If you read this then NanoPlot 1.34.0 has crashed :-(
Please try updating NanoPlot and see if that helps...

If not, please report this issue at https://github.com/wdecoster/NanoPlot/issues
If you could include the log file that would be really helpful.
Thanks!

concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/mariano/miniconda3/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/mariano/miniconda3/lib/python3.8/concurrent/futures/process.py", line 198, in _process_chunk
    return [fn(*args) for args in chunk]
  File "/home/mariano/miniconda3/lib/python3.8/concurrent/futures/process.py", line 198, in <listcomp>
    return [fn(*args) for args in chunk]
  File "/home/mariano/miniconda3/lib/python3.8/site-packages/nanoget/extraction_functions.py", line 155, in process_bam
    samfile = check_bam(bam)
  File "/home/mariano/miniconda3/lib/python3.8/site-packages/nanoget/extraction_functions.py", line 102, in check_bam
    samfile = pysam.AlignmentFile(bam, "rb")
  File "pysam/libcalignmentfile.pyx", line 742, in pysam.libcalignmentfile.AlignmentFile.__cinit__
  File "pysam/libcalignmentfile.pyx", line 991, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mariano/miniconda3/bin/NanoPlot", line 8, in <module>
    sys.exit(main())
  File "/home/mariano/miniconda3/lib/python3.8/site-packages/nanoplot/NanoPlot.py", line 59, in main
    datadf = get_input(
  File "/home/mariano/miniconda3/lib/python3.8/site-packages/nanoget/nanoget.py", line 92, in get_input
    dfs=[out for out in executor.map(extraction_function, files)],
  File "/home/mariano/miniconda3/lib/python3.8/site-packages/nanoget/nanoget.py", line 92, in <listcomp>
    dfs=[out for out in executor.map(extraction_function, files)],
  File "/home/mariano/miniconda3/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/home/mariano/miniconda3/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/home/mariano/miniconda3/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/home/mariano/miniconda3/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False
wdecoster commented 3 years ago

Hi,

Thanks for reporting this, does the bam file contain aligned reads, or is it unmapped bam? For the latter, please try again with --ubam. I realized I haven't updated the documentation on GitHub with that option for unmapped bams.

If that doesn't help, would it be possible to share an example file that results in this error?

Best, Wouter

mavino commented 3 years ago

It was a ubam so it worked thanks...however I might open other issues because I still have the problem with aligned bams. Thank you so much.

mavino commented 3 years ago

actually even for the bam files aligned (they come from an alignment with blasr) ran fine with ubam option...

wdecoster commented 3 years ago

hmm I could try blasr to replicate your issue. When looking at the unaligned bams you will miss certain features that can be extracted from a bam which is aligned

mavino commented 3 years ago

slightly different error though, again the bam comes from blasr alignment:

NanoPlot -t 1 --color yellow --bam /home/mariano/Documents/Wellinger/Telomerase/TEL06R/Sequel.RunS140_S2.004.BC1299_Forward.TEL06R.ccs.TEL11Raligned.bam --downsample 10000 -o /home/mariano/Documents/Wellinger/Telomerase/TEL06R/Sequel.RunS140_S2.004.BC1299_Forward.TEL06R.ccs.TEL11Raligned.bam.bamplots_downsampled
[E::idx_find_and_load] Could not retrieve index file for '/home/mariano/Documents/Wellinger/Telomerase/TEL06R/Sequel.RunS140_S2.004.BC1299_Forward.TEL06R.ccs.TEL11Raligned.bam'
[E::hts_idx_push] Unsorted positions on sequence #1: 471 followed by 1
[E::sam_index] Read 'm64128_201203_104401/1325/ccs' with ref_name='TEL11R_SubtelomericRegion', ref_length=915, flags=16, pos=1 cannot be indexed

If you read this then NanoPlot 1.34.0 has crashed :-(
Please try updating NanoPlot and see if that helps...

If not, please report this issue at https://github.com/wdecoster/NanoPlot/issues
If you could include the log file that would be really helpful.
Thanks!

concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/mariano/miniconda3/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/home/mariano/miniconda3/lib/python3.8/concurrent/futures/process.py", line 198, in _process_chunk
    return [fn(*args) for args in chunk]
  File "/home/mariano/miniconda3/lib/python3.8/concurrent/futures/process.py", line 198, in <listcomp>
    return [fn(*args) for args in chunk]
  File "/home/mariano/miniconda3/lib/python3.8/site-packages/nanoget/extraction_functions.py", line 155, in process_bam
    samfile = check_bam(bam)
  File "/home/mariano/miniconda3/lib/python3.8/site-packages/nanoget/extraction_functions.py", line 104, in check_bam
    pysam.index(bam)
  File "/home/mariano/miniconda3/lib/python3.8/site-packages/pysam/utils.py", line 69, in __call__
    raise SamtoolsError(
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools index: failed to create index for "/home/mariano/Documents/Wellinger/Telomerase/TEL06R/Sequel.RunS140_S2.004.BC1299_Forward.TEL06R.ccs.TEL11Raligned.bam"\n'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/mariano/miniconda3/bin/NanoPlot", line 8, in <module>
    sys.exit(main())
  File "/home/mariano/miniconda3/lib/python3.8/site-packages/nanoplot/NanoPlot.py", line 59, in main
    datadf = get_input(
  File "/home/mariano/miniconda3/lib/python3.8/site-packages/nanoget/nanoget.py", line 92, in get_input
    dfs=[out for out in executor.map(extraction_function, files)],
  File "/home/mariano/miniconda3/lib/python3.8/site-packages/nanoget/nanoget.py", line 92, in <listcomp>
    dfs=[out for out in executor.map(extraction_function, files)],
  File "/home/mariano/miniconda3/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/home/mariano/miniconda3/lib/python3.8/concurrent/futures/_base.py", line 611, in result_iterator
    yield fs.pop().result()
  File "/home/mariano/miniconda3/lib/python3.8/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/home/mariano/miniconda3/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools index: failed to create index for "/home/mariano/Documents/Wellinger/Telomerase/TEL06R/Sequel.RunS140_S2.004.BC1299_Forward.TEL06R.ccs.TEL11Raligned.bam"\n'

the first lines of its corresponding sam:

@HD VN:1.5  SO:UNKNOWN  pb:3.0.1
@SQ SN:TEL06R_SubtelomericRegion    LN:976  M5:c4d6e9d8d1e39ebdbe8f93e4f2a6f132
@RG ID:af5f3ee9 PL:PACBIO   DS:READTYPE=CCS;BINDINGKIT=101-820-500;SEQUENCINGKIT=101-826-100;BASECALLERVERSION=5.0.0;FRAMERATEHZ=100.000000 PU:m64128_201203_104401 PM:SEQUEL   CM:S/P4.1-C2/5.0-8M
@PG ID:2    PN:BLASR    VN:5.3.3-SL-release-8.0.0+1 CL:/opt/pacbio/smrtlink/install/smrtlink-release_9.0.0.92188/bundles/smrttools/install/smrttools-release_9.0.0.92188/private/pacbio/blasr/binwrap/../../../../private/pacbio/blasr/bin/blasr /home/mariano/Documents/Wellinger/Telomerase/TEL06R/Sequel.RunS140_S2.004.BC1299_Forward.TEL06R.ccs.bam /home/mariano/Documents/Wellinger/Telomerase/TEL06R/TEL06R_SubtelomericRegion.fa --out /home/mariano/Documents/Wellinger/Telomerase/TEL06R/Sequel.RunS140_S2.004.BC1299_Forward.TEL06R.ccs.TEL06Raligned.bam --nproc 8 --placeGapConsistently --useQuality --unaligned /home/mariano/Documents/Wellinger/Telomerase/TEL06R/Sequel.RunS140_S2.004.BC1299_Forward.TEL06R.ccs.TEL06Runaligned.bam --sam 
@PG ID:ccs-4.0.0    PN:ccs  VN:4.0.0    DS:Generate circular consensus sequences (ccs) from subreads.   CL:ccs /cvmfs/soft.mugqic/CentOS6/software/SMRTLink/SMRTLink-8.0.0/install/smrtlink-release_8.0.0.80529/bundles/smrttools/install/smrttools-release_8.0.0.80502/private/pacbio/unanimity/binwrap/../../../../private/pacbio/unanimity/bin/ccs --min-passes 3 --num-threads 16 /lustre03/project/6033481/4nanuq2/sequelRuns/r64128_20201201_182904/4_D01/m64128_201203_104401.subreads.bam /lustre03/project/6033481/4nanuq2/sequelNanuq/r64128_20201201_182904/4_D01/m64128_201203_104401.ccs.bam
m64128_201203_104401/102/ccs    0   TEL06R_SubtelomericRegion   1   254 40S298=1X278=1X258=1X38=1X92=315S   *   0   0   ATCGCATCGCAGAGACGTATCATTAAAGACACCGCCAAGCTTCCAATATCACGAGTAAGGATCAAAGTTATGTTAGAGATAACTGTGAGTTTTTTATTTTTTGATCGATTTCCAAGATCATTCCTCAATCATAATCTATATGATTCAATATGTCCTTTCTTTGCGTGGCAATATACCTCATATTATCTTTCTATTTACAGGCAGTCCTTTCTATTTCATTTCTTACAAAAGGATTTTAGCAACGACTTCGTCTCAGAAGAGTTAATATATGCACTAGTTGCACTAGGCGCAAAAAATTCCTTTGACAATAGCCTTTCAAAGCATACATATGAATATTAAAACCACTCAAAGAGAAATTTACTGGAAGATTCGACAAATAAAAATTCAGCTTTTTCAAGTGCAAGCGTAACAAAGCCATAATGCCTCCTATATTTAGCCTTTTTTGATATAACTGTCGGAGAGTTAACAAGCGGCTGGACTACTTTCTGGAATAGCGTTCGGAATGTGTTTTACTTAAGGATTCGAACGTGATCCTAACGAGTGGATGCACAGTTCAGAGTTATCTAACAATATTCGTGAAGGATATGTCAAAATTGGATACGCTTATGTTTATGATACATCATTTATATTAATATATAGTATGCTCACATTTTCTTATTGCTGAATAGTTCTTTTTTACGTTTAGCTGAGTTTAACGGTGATTATTAGGTGGATTTTATATTAGTCTACATAAAAATAAGTGGTGGATATCTACATAAAATTGTCATAACGCGTAAACTAAAAATTATTTTTATGATCATTGAGGATCTATAATCAACTATAGACATTAATGTATGGATAATCATGAGGATTATAGGTAAATGGCAAGGGTAAAAATCAGTGAGGCCATTTCCGTGTGTAGTGATCCGAACTCAGCTACTATTGATGGAAATGAGGACTGGGTCATGGGGCGCAATGGAGTGAAGTAATATATACTTTAGCATACGTGTGCGTACGCCATATCAATATACTAGTGAGGTGGTGTGGGTGTGGTGTGTGGGTGTGGTGTGTGGGTGTGGTGTGGGTGTGTGGGTGTGGTGTGTGTGTGTGGGTGTGGGTGTGGGTGTGGGTGTGGGTGTGGTGTGGTGTGGGTGTGGTGTGTGTGTGGGTGTGGTGTGTGGGTGTGGGTGTGGTGTGGTGTGTGTGGGTGTGGGTGTGTGGGTGTGGTGTGGGTGTGGTGTGTGGTGTGTGTGTGGGTGTGTGGGTGTGGTGTGTGTGGGTGTGGTGTGGTGTGTGGGTGTGGGTGTCCCCCCCCCCCCCCCCCCCCCCAGTGAGAGCGCGATA ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~0~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~0~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ RG:Z:af5f3ee9   np:i:68 rq:f:0.999952   sn:B:f,12.0527,18.1189,3.75322,7.18479  zm:i:102    AS:i:-4796  NM:i:4
m64128_201203_104401/90/ccs 0   TEL06R_SubtelomericRegion   1   254 40S298=1X677=281S   *   0   0   ATCGCATCGCAGAGACGTATCATTAAAGACACCGCCAAGCTTCCAATATCACGAGTAAGGATCAAAGTTATGTTAGAGATAACTGTGAGTTTTTTATTTTTTGATCGATTTCCAAGATCATTCCTCAATCATAATCTATATGATTCAATATGTCCTTTCTTTGCGTGGCAATATACCTCATATTATCTTTCTATTTACAGGCAGTCCTTTCTATTTCATTTCTTACAAAAGGATTTTAGCAACGACTTCGTCTCAGAAGAGTTAATATATGCACTAGTTGCACTAGGCGCAAAAAATTCCTTTGACAATAGCCTTTCAAAGCATACATATGAATATTAAAACCACTCAAAGAGAAATTTACTGGAAGATTCGACAAATAAAAATTCAGCTTTTTCAAGTGCAAGCGTAACAAAGCCATAATGCCTCCTATATTTAGCCTTTTTTGATATAACTGTCGGAGAGTTAACAAGCGGCTGGACTACTTTCTGGAATAGCGTTCGGAATGTGTTTTACTTAAGGATTCGAACGTGATCCTAACGAGTGGATGCACAGTTCAGAGTTATCTAACAATATTCGTGAAGGATATGTCAAAATTGGATACGCTTATGTTTATGATATATCATTTATATTAATATATAGTATGCTCACATTTTCTTATTGCTGAATAGTTCTTTTTTACGTTTAGCTGAGTTTAACGGTGATTATTAGGTGGATTTTATATTAGTCTACATAAAAATAAGTGGTGGATATCTACATAAAATTGTCATAACGCGTAAACTAAAAATTATTTTTATGATCATTGAGGATCTATAATCAACTATAGACATTAATGTATGGATAATCATGAGGATTATAGGTAAATGGCAAGGGTAAAAACCAGTGAGGCCATTTCCGTGTGTAGTGATCCGAACTCAGTTACTATTGATGGAAATGAGGACTGGGTCATGGGGCGCAATGGAGTGAAGTAATATATACTTTAGCATACGTGTGCGTACGCCATATCAATATGCTAGTGAGGTGGTGTGGGTGTGGTGTGTGGGTGTGGTGTGTGGGTGTGGTGTGGGTGTGTGGGTGTGGTGTGTGTGTGTGGGTGTGGGTGTGGGTGTGGGTGTGGGTGTGGTGTGGTGTGGGTGTGGTGTGTGTGTGGGTGTGGTGTGTGGTGTGGGTGTGGTGTGGTGTGTGGGTGTGGGTGTGTGGGTGTGGTGTGGGTGTGGTGTGTGGGTGTGGTGTGTGTGTGGGTGTGGTGTGTGTGGGTGTGTGCCCCCCCCCCCCCCCCCCCCCAGTGAGAGCGCGATAA   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~z~~~~~~~i~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~u~~~~~~~~~~\~~~~b~Fg~~~4\~~~~}~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'~~~~~~~~~~*~l~~~@c~~~~~~~~~~~~~~~~~~~~~~=~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~b   RG:Z:af5f3ee9   np:i:45 rq:f:0.999697   sn:B:f,13.1277,19.9534,4.00496,8.07566  zm:i:90 AS:i:-4869  NM:i:1
wdecoster commented 3 years ago

Hmm there seems to be something about these bams that makes pysam unhappy. The last error suggests there was no index, and creation of the index failed. I'll try to look into it, but can't promise that will be done this week. Out of curiosity, I thought PacBio had moved on to minimap2/pbmm2 by default?

mavino commented 3 years ago

I am new to pacbio, I was just exploring smrt tools and I came across with blasr first...

wdecoster commented 3 years ago

Aha, that makes sense. Maybe blasr is still in use, not sure. I am going to reopen this issue as a reminder to look at blasr-bam files...