sanger-pathogens / seroba

k-mer based Pipeline to identify the Serotype from Illumina NGS reads
https://sanger-pathogens.github.io/seroba/
Other
19 stars 16 forks source link

TypeError: argument of type 'NoneType' is not iterable #73

Closed HarryHung closed 1 year ago

HarryHung commented 1 year ago

On a sample, SeroBA encounters a fatal Python error

cluster detected 1 threads available to it
cluster reported completion
cluster_3 detected 1 threads available to it
cluster_3 reported completion
cluster_4 detected 1 threads available to it
cluster_4 reported completion
cluster_6 detected 1 threads available to it
cluster_6 reported completion

0.013121071707115657
/seroba-1.0.2/build/kmc_tools simple /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/NP-0087-IDRL-AKU_S92_trimmed seroba/database/kmer_db/11F/11F intersect /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/inter
0.03264454465908326
/seroba-1.0.2/build/kmc_tools simple /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/NP-0087-IDRL-AKU_S92_trimmed seroba/database/kmer_db/06C/06C intersect /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/inter
0.034005116366132154
/seroba-1.0.2/build/kmc_tools simple /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/NP-0087-IDRL-AKU_S92_trimmed seroba/database/kmer_db/10A/10A intersect /home/ubuntu/local-repo/gps-unified-pipeline/work/64/bd269cc67f479cb03ad3532be5311d/temp.kmcl69cg5cc/inter
0.018366189193022266
15C
{'15A': 0, '15B': 0, '15C': 0, '15F': 16}
15A
{'genes': [], 'pseudo': [], 'allele': [], 'snps': []}
15B
{'genes': [], 'pseudo': [], 'allele': [], 'snps': []}
15C
{'genes': [], 'pseudo': [], 'allele': [], 'snps': []}
15C
15F
{'genes': [], 'pseudo': [], 'allele': [], 'snps': []}
{'15A': -1, '15B': 0, '15C': -2.5, '15F': 15}
{'15A': {'genes': [], 'pseudo': [], 'allele': [], 'snps': []}, '15B': {'genes': [], 'pseudo': [], 'allele': [], 'snps': []}, '15C': {'genes': [], 'pseudo': ['wciZ'], 'allele': [], 'snps': []}, '15F': {'genes': [], 'pseudo': [], 'allele': [], 'snps': []}}
['15A', '15C']
15A
15B/15C
15C
None
Traceback (most recent call last):
  File "/usr/local/bin/seroba", line 4, in <module>
    __import__('pkg_resources').run_script('seroba==1.0.2', 'seroba')
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 658, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 1445, in run_script
    exec(script_code, namespace, namespace)
  File "/usr/local/lib/python3.6/dist-packages/seroba-1.0.2-py3.6.egg/EGG-INFO/scripts/seroba", line 86, in <module>
  File "/usr/local/lib/python3.6/dist-packages/seroba-1.0.2-py3.6.egg/seroba/tasks/sero_run.py", line 19, in run
  File "/usr/local/lib/python3.6/dist-packages/seroba-1.0.2-py3.6.egg/seroba/serotyping.py", line 481, in run
  File "/usr/local/lib/python3.6/dist-packages/seroba-1.0.2-py3.6.egg/seroba/serotyping.py", line 453, in _prediction
  File "/usr/local/lib/python3.6/dist-packages/seroba-1.0.2-py3.6.egg/seroba/serotyping.py", line 397, in _find_serotype
TypeError: argument of type 'NoneType' is not iterable

The relevant part in the code is

https://github.com/sanger-pathogens/seroba/blob/8138dc8713e4dec1a6a4379b91e20e4dc958160c/seroba/serotyping.py#L392-L397

I think this piece of code has a logic flaw.

While iterating through min_keys in line 393:

Fixing this seems to be trivial, but I am not sure which one is the right approach:

HarryHung commented 1 year ago

Only when all in min_keys are not in mixed_serotype, then mixed_serotype should be set to None:

if mixed_serotype != None:
    for key in min_keys:
        print(key)
        print(mixed_serotype)
        if key in mixed_serotype:
            break
    else:
        mixed_serotype = None

When any in min_keys is not in mixed_serotype, mixed_serotype should be set to None

if mixed_serotype != None:
    for key in min_keys:
        print(key)
        print(mixed_serotype)
        if key not in mixed_serotype:
           mixed_serotype = None
           break
flass commented 1 year ago

hi Harry, thanks for pointing this out, it is indeed a flaw in the logic and for proposing fixes. in my view, mixed_serotype should be just an empty string when not populated, but I think we should keep it as a None to avoid introducing new bugs. I think of your two proposed fixes, the first one makes more sense as in the second one, a serotype not being mentioned in the mixed_serotype string generated by the previous _detect_mixed_samples() call will lead to erase the rest of the information, which sounds wrong, even though I'm not too sure exactly what is the rationale behind this bit of code.

If you (and others in the team) agree, we can implement the fix.

Best wishes,

Florent

HarryHung commented 1 year ago

Agreed with your view that mixed_serotype should never be None to begin with, but also agreed that better keep it as-is at this point.

I also think the first fix makes more sense at a glance, but I got to be honest that I have not deep dived into the codes to have a complete understanding.

Thank you!

flass commented 1 year ago

n worries. We'll review this and introduce a fix. if you do't mind then, we'll ask you to test the fixed branch. is that OK?

HarryHung commented 1 year ago

Sure, thanks for your help!