streetslab / dimelo

python package for analysis of dimelo-seq & nanopore modified base data
MIT License
3 stars 5 forks source link

Check for out-of-bounds bases specified in MM/ML tags #32

Open thekugelmeister opened 1 year ago

thekugelmeister commented 1 year ago

For malformed BAM files, the number of bases specified in the MM/ML tags may be greater than the number of bases in a read. When this happens, pysam appears content to allow normal operation until attempting to access one of these out-of-bounds bases, then throws an index error, causing a traceback like the following:

"""

Traceback (most recent call last):

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 431, in _process_worker

    r = call_item()

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/site-packages/joblib/externals/loky/process_executor.py", line 285, in __call__

    return self.fn(*self.args, **self.kwargs)

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 595, in __call__

    return self.func(*args, **kwargs)

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__

    for func, args, kwargs in self.items]

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp>

    for func, args, kwargs in self.items]

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/site-packages/dimelo/parse_bam.py", line 460, in parse_reads_window

    extractAllBases,

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/site-packages/dimelo/parse_bam.py", line 554, in get_modified_reference_positions

    extractAllBases,

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/site-packages/dimelo/parse_bam.py", line 644, in get_mod_reference_positions_by_mod

    modified_bases = base_index[index_adj]

IndexError: index 1586 is out of bounds for axis 0 with size 1586

"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "/home/sbombin/miniconda3/envs/dimelo/bin/dimelo-plot-enrichment", line 6, in <module>

    main()

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/site-packages/dimelo/plot_enrichment.py", line 320, in main

    plot_enrichment(**vars(args))

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/site-packages/dimelo/plot_enrichment.py", line 160, in plot_enrichment

    num_cores,

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/site-packages/dimelo/plot_enrichment.py", line 210, in get_counts

    cores=num_cores,

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/site-packages/dimelo/parse_bam.py", line 332, in parse_bam

    for window in windows

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/site-packages/joblib/parallel.py", line 1054, in __call__

    self.retrieve()

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/site-packages/joblib/parallel.py", line 933, in retrieve

    self._output.extend(job.get(timeout=self.timeout))

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 542, in wrap_future_result

    return future.result(timeout=timeout)

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/concurrent/futures/_base.py", line 435, in result

    return self.__get_result()

  File "/home/sbombin/miniconda3/envs/dimelo/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result

    raise self._exception

IndexError: index 1586 is out of bounds for axis 0 with size 1586

There should be a check added to prevent such an error from occurring / improve error readability.

thekugelmeister commented 1 year ago

In investigating this error, I managed to get pysam to throw the following warning:

[E::bam_parse_basemod] MM tag refers to bases beyond sequence length

Thus far, I have only been able to get this to occur when attempting to TAB auto-complete methods for a pysam.AlignedSegment object that has this sort of malformed MM/ML tags. I have been unable to trigger this warning in other ways, despite my best efforts.

Search results for this warning in the pysam github repo: https://github.com/pysam-developers/pysam/search?q=MM+tag+refers+to+bases+beyond+sequence+length

OberonDixon commented 6 months ago

Tracking here: https://streets-lab.atlassian.net/browse/DM-67