streetslab / dimelo

python package for analysis of dimelo-seq & nanopore modified base data
MIT License
3 stars 5 forks source link

(parse_bam) UnboundLocalError: local variable 'windows' referenced before assignment #11

Closed thekugelmeister closed 2 years ago

thekugelmeister commented 2 years ago

When testing parse_bam as follows:

from pathlib import Path

import dimelo as dm

working_dir = Path('.')
mod_basecall_file = working_dir / 'prod_free_Hia5_mod_mappings.sorted.bam'
sample_name = 'free_Hia5'

dm.parse_bam(fileName=str(mod_basecall_file),
             sampleName=sample_name,
             outDir=str(working_dir / sample_name),
             basemod="A+CG",
             center=False,
             # windowSize=500,
             extractAllBases=True,
)

I ran into the following error:

Traceback (most recent call last):
  File "dimelo_package_test.py", line 18, in <module>
    extractAllBases=True,
  File "/home/thekugelmeister/devspace/streets_rotation/dimelo/dimelo/parse_bam.py", line 247, in parse_bam
    for window in windows
UnboundLocalError: local variable 'windows' referenced before assignment

Uncommenting the windowSize argument doesn't fix things. As far as I can tell from the documentation, I have not specified an invalid set of arguments.

If this is an invalid set of arguments

This behavior should be documented, and the method should raise a more informative error that describes why things aren't working.

If this is a valid set of arguments

This behavior should be diagnosed and fixed.

thekugelmeister commented 2 years ago

After perusing the calls to parse_bam throughout the package, I think it is clear that one of bedFile or region currently needs to be specified in order for the call to go through.

Added documentation

If this is intended functionality, it needs to be documented. Why do I need to specify a region? What does that region represent? What is the difference between these two arguments? They should also be more clearly associated with each other in the documentation (maybe changing the method signature to have them next to each other would be nice...).

Mutually exclusive options?

Should these be mutually exclusive? What happens / should happen when I call the method with both arguments?

Default region?

If no region is specified, is it reasonable to assume the user wants the full sequence? What are the pros and cons of this choice?

amaslan commented 2 years ago

Good point. As is, bedFile and region are mutually exclusive options. For plotting, we extract base mod calls just in regions that will be plotted. An option to extract base mods for the entire bam file is a good idea. I think it may be pretty slow and need to consider how best to parallelize because current parallelization is across windows. Let's discuss.

amaslan commented 2 years ago

Addressed on cleanup branch. Added print statement that bedFile and region parameters are mutually exclusive. Also print statement and return if neither is specified. Also made this clearer in documentation. Will close when merged.