phac-nml / biohansel

Rapidly subtype microbial genomes using single-nucleotide variant (SNV) subtyping schemes
Apache License 2.0
26 stars 7 forks source link

Error in QC join function #25

Closed mgopez closed 6 years ago

mgopez commented 6 years ago

In the current version of bio_hansel, we get an error on Galaxy as follows:

Fatal error: Exit code 1 ()
2018-01-10 10:53:56,700 DEBUG: Namespace(files=[], force=False, input_directory=None, input_fasta_genome_name=None, keep_tmp=False, low_cov_depth_freq=20, max_intermediate_tiles=0.05, max_kmer_freq=1000, max_missing_tiles=0.05, min_ambiguous_tiles=3, min_kmer_freq=8, output_simple_summary='tech_results.tab', output_summary='results.tab', output_tile_results='match_results.tab', paired_reads=[['CE-R-09-0025_EC20081043_S12_L001_001_1', 'CE-R-09-0025_EC20081043_S12_L001_001_2']], scheme='heidelberg', scheme_name=None, slow=False, threads=1, tmp_dir='/tmp', verbose=3) [in /Warehouse/galaxy/deps/_conda/envs/mulled-v1-ad86f404540f17af24d34154fed6fcb9f38d2ffb41a623e050ebe4e15ee2ad90/lib/python3.6/site-packages/bio_hansel/main.py:209]
2018-01-10 10:53:56,712 INFO: Serial single threaded run mode on 1 input genomes [in /Warehouse/galaxy/deps/_conda/envs/mulled-v1-ad86f404540f17af24d34154fed6fcb9f38d2ffb41a623e050ebe4e15ee2ad90/lib/python3.6/site-packages/bio_hansel/subtyper.py:493]
2018-01-10 10:53:56,713 INFO: genome_name CE-R-09-0025_EC20081043_S12_L001_001 [in /Warehouse/galaxy/deps/_conda/envs/mulled-v1-ad86f404540f17af24d34154fed6fcb9f38d2ffb41a623e050ebe4e15ee2ad90/lib/python3.6/site-packages/bio_hansel/subtyper.py:407]
2018-01-10 10:54:32,736 DEBUG: max substype str len: 7 [in /Warehouse/galaxy/deps/_conda/envs/mulled-v1-ad86f404540f17af24d34154fed6fcb9f38d2ffb41a623e050ebe4e15ee2ad90/lib/python3.6/site-packages/bio_hansel/subtyper.py:446]
2018-01-10 10:54:32,740 DEBUG: pos_subtypes: [[2], [2, 2], [2, 2, 2], [2, 2, 2, 2]] [in /Warehouse/galaxy/deps/_conda/envs/mulled-v1-ad86f404540f17af24d34154fed6fcb9f38d2ffb41a623e050ebe4e15ee2ad90/lib/python3.6/site-packages/bio_hansel/subtyper.py:450]
2018-01-10 10:54:32,741 DEBUG: inconsistent_subtypes: [] [in /Warehouse/galaxy/deps/_conda/envs/mulled-v1-ad86f404540f17af24d34154fed6fcb9f38d2ffb41a623e050ebe4e15ee2ad90/lib/python3.6/site-packages/bio_hansel/subtyper.py:452]
Traceback (most recent call last):
  File "/Warehouse/galaxy/deps/_conda/envs/mulled-v1-ad86f404540f17af24d34154fed6fcb9f38d2ffb41a623e050ebe4e15ee2ad90/bin/hansel", line 11, in <module>
    load_entry_point('bio-hansel==1.1.0', 'console_scripts', 'hansel')()
  File "/Warehouse/galaxy/deps/_conda/envs/mulled-v1-ad86f404540f17af24d34154fed6fcb9f38d2ffb41a623e050ebe4e15ee2ad90/lib/python3.6/site-packages/bio_hansel/main.py", line 259, in main
    n_threads=n_threads)
  File "/Warehouse/galaxy/deps/_conda/envs/mulled-v1-ad86f404540f17af24d34154fed6fcb9f38d2ffb41a623e050ebe4e15ee2ad90/lib/python3.6/site-packages/bio_hansel/subtyper.py", line 500, in query_reads_ac
    for fastq_files, genome_name in reads]
  File "/Warehouse/galaxy/deps/_conda/envs/mulled-v1-ad86f404540f17af24d34154fed6fcb9f38d2ffb41a623e050ebe4e15ee2ad90/lib/python3.6/site-packages/bio_hansel/subtyper.py", line 500, in <listcomp>
    for fastq_files, genome_name in reads]
  File "/Warehouse/galaxy/deps/_conda/envs/mulled-v1-ad86f404540f17af24d34154fed6fcb9f38d2ffb41a623e050ebe4e15ee2ad90/lib/python3.6/site-packages/bio_hansel/subtyper.py", line 474, in subtype_reads_ac
    st.qc_status, st.qc_message = perform_quality_check(st, df, subtyping_params)
  File "/Warehouse/galaxy/deps/_conda/envs/mulled-v1-ad86f404540f17af24d34154fed6fcb9f38d2ffb41a623e050ebe4e15ee2ad90/lib/python3.6/site-packages/bio_hansel/qc/__init__.py", line 46, in perform_quality_check
    status, message = func(st, df, subtyping_params)
  File "/Warehouse/galaxy/deps/_conda/envs/mulled-v1-ad86f404540f17af24d34154fed6fcb9f38d2ffb41a623e050ebe4e15ee2ad90/lib/python3.6/site-packages/bio_hansel/qc/checks.py", line 92, in is_mixed_subtype
    '; '.join(conflicting_tiles['refposition'].tolist()),
TypeError: sequence item 0: expected str instance, int found

This is because in qc/checks.py at line: 92: ; '.join(conflicting_tiles['refposition'].tolist()) will try to join 1 conflicting tile's refposition which is being interpreted as an integer.

Should change this to: '; '.join(conflicting_tiles['refposition'].astype(str).tolist())

peterk87 commented 6 years ago

Thanks for catching this @hellothisisMatt ! I'll cut a new release with this bug fixed. Added a test to ensure it doesn't occur again.