Influenza B specimen causes `parse_influenza_blast_results.py` to crash

fanninpm commented 2 years ago

I know that this workflow is advertised to run on Influenza A specimens, but IRMA can also run on Influenza B specimens. However, when I try to run the workflow on an Influenza B control, parse_influenza_blast_results.py crashes with an IndexError.

.command.err

```text WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. 2021-12-03 18:50:38,273 INFO: Parsing Influenza metadata file "genomeset.dat.gz" [in parse_influenza_blast_results.py:354] 2021-12-03 18:50:40,004 INFO: Parsed Influenza metadata file into DataFrame with n=536691 rows and n=11 columns. There are 169 unique subtypes. [in parse_influenza_blast_results.py:376] 2021-12-03 18:50:40,005 INFO: Parsing BLAST results from BPC.blastn.txt [in parse_influenza_blast_results.py:183] 2021-12-03 18:50:40,463 INFO: Parsed 112085 BLAST results from BPC.blastn.txt [in parse_influenza_blast_results.py:197] 2021-12-03 18:50:40,463 INFO: BPC | n=112085 | Filtering for hits above 0.85% identity. [in parse_influenza_blast_results.py:198] 2021-12-03 18:50:40,477 INFO: BPC | n=111369 | Filtered for hits above 0.85% identity. [in parse_influenza_blast_results.py:204] /sample/tmp-tf-ton01/workspace/nf-iav-illumina/bin/parse_influenza_blast_results.py:207: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_filtered["accession"] = df_filtered.saccver.str.extract( /sample/tmp-tf-ton01/workspace/nf-iav-illumina/bin/parse_influenza_blast_results.py:210: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_filtered["sample"] = sample_name /sample/tmp-tf-ton01/workspace/nf-iav-illumina/bin/parse_influenza_blast_results.py:211: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_filtered["sample"] = pd.Categorical(df_filtered["sample"]) /sample/tmp-tf-ton01/workspace/nf-iav-illumina/bin/parse_influenza_blast_results.py:212: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_filtered["sample_segment"] = df_filtered.qaccver.str.extract(r".+_(\d)$").astype( /sample/tmp-tf-ton01/workspace/nf-iav-illumina/bin/parse_influenza_blast_results.py:215: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_filtered["sample_segment"] = pd.Categorical(df_filtered["sample_segment"]) /sample/tmp-tf-ton01/workspace/nf-iav-illumina/bin/parse_influenza_blast_results.py:217: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_filtered["subtype_from_match_title"] = ( /sample/tmp-tf-ton01/workspace/nf-iav-illumina/bin/parse_influenza_blast_results.py:220: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_filtered["subtype_from_match_title"] = df_filtered["subtype_from_match_title"] 2021-12-03 18:50:41,288 INFO: BPC | Merging NCBI Influenza DB genome metadata with BLAST results on accession. [in parse_influenza_blast_results.py:221] ╭───────────────────── Traceback (most recent call last) ──────────────────────╮ │ /sample/tmp-tf-ton01/workspace/nf-iav-illumina/bin/parse_influenza_blast_res │ │ ults.py:495 in │ │ │ │ 492 │ │ 493 │ │ 494 if __name__ == "__main__": │ │ ❱ 495 │ report() │ │ 496 │ │ │ │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │ │ │ __annotations__ = {} │ │ │ │ __builtins__ = │ │ │ │ __cached__ = None │ │ │ │ __doc__ = '\nGenerate an Influenza H/N │ │ │ │ subtyping report from nucleotide │ │ │ │ BLAST results for on'+256 │ │ │ │ __file__ = '/sample/tmp-tf-ton01/workspace/nf-… │ │ │ │ __loader__ = <_frozen_importlib_external.SourceF… │ │ │ │ object at 0x7f37eae999d0> │ │ │ │ __name__ = '__main__' │ │ │ │ __package__ = None │ │ │ │ __spec__ = None │ │ │ │ __warningregistry__ = { │ │ │ │ │ 'version': 10, │ │ │ │ │ ('\nA value is trying to be set │ │ │ │ on a copy of a slice from a │ │ │ │ DataFrame.\nTry using │ │ │ │ .loc[row_indexer,col_indexer] = │ │ │ │ value instead\n\nSee the caveats in │ │ │ │ the documentation: │ │ │ │ https://pandas.pydata.org/pandas-do… │ │ │ │ ), │ │ │ │ │ ('pident', ), │ │ │ │ │ ('length', 'uint16'), │ │ │ │ │ ('mismatch', 'uint16'), │ │ │ │ │ ('gapopen', 'uint16'), │ │ │ │ │ ('qstart', 'uint16'), │ │ │ │ │ ('qend', 'uint16'), │ │ │ │ │ ('sstart', 'uint16'), │ │ │ │ │ ('send', 'uint16'), │ │ │ │ │ ... +6 │ │ │ │ ] │ │ │ │ blast_results_report_columns = [ │ │ │ │ │ ('sample', 'Sample'), │ │ │ │ │ ( │ │ │ │ │ │ 'sample_segment', │ │ │ │ │ │ 'Sample Genome Segment │ │ │ │ Number' │ │ │ │ │ ), │ │ │ │ │ ( │ │ │ │ │ │ 'accession', │ │ │ │ │ │ 'Reference NCBI Accession' │ │ │ │ │ ), │ │ │ │ │ ( │ │ │ │ │ │ 'subtype', │ │ │ │ │ │ 'Reference Subtype' │ │ │ │ │ ), │ │ │ │ │ ( │ │ │ │ │ │ 'pident', │ │ │ │ │ │ 'BLASTN Percent Identity' │ │ │ │ │ ), │ │ │ │ │ ( │ │ │ │ │ │ 'length', │ │ │ │ │ │ 'BLASTN Alignment Length' │ │ │ │ │ ), │ │ │ │ │ ( │ │ │ │ │ │ 'mismatch', │ │ │ │ │ │ 'BLASTN Mismatches' │ │ │ │ │ ), │ │ │ │ │ ('gapopen', 'BLASTN Gaps'), │ │ │ │ │ ( │ │ │ │ │ │ 'qstart', │ │ │ │ │ │ 'BLASTN Sample Start Index' │ │ │ │ │ ), │ │ │ │ │ ( │ │ │ │ │ │ 'qend', │ │ │ │ │ │ 'BLASTN Sample End Index' │ │ │ │ │ ), │ │ │ │ │ ... +16 │ │ │ │ ] │ │ │ │ click = │ │ │ │ defaultdict = │ │ │ │ Dict = typing.Dict │ │ │ │ find_h_or_n_type = │ │ │ │ get_col_widths = │ │ │ │ get_subtype_value = │ │ │ │ List = typing.List │ │ │ │ LOG_FORMAT = '%(asctime)s %(levelname)s: │ │ │ │ %(message)s [in │ │ │ │ %(filename)s:%(lineno)d]' │ │ │ │ logging = │ │ │ │ pd = > │ │ │ │ re = │ │ │ │ REGEX_UNALLOWED_EXCEL_WS_CHARS = re.compile('[\\\\:/?*\\[\\]]+') │ │ │ │ report = │ │ │ │ RichHandler = │ │ │ │ subtype_results_summary_columns = [ │ │ │ │ │ 'sample', │ │ │ │ │ 'subtype', │ │ │ │ │ 'H_top_accession', │ │ │ │ │ 'H_type', │ │ │ │ │ 'H_virus_name', │ │ │ │ │ │ │ │ │ 'H_NCBI_Influenza_DB_proportion_mat… │ │ │ │ │ 'N_top_accession', │ │ │ │ │ 'N_type', │ │ │ │ │ 'N_virus_name', │ │ │ │ │ │ │ │ │ 'N_NCBI_Influenza_DB_proportion_mat… │ │ │ │ ] │ │ │ │ subtype_results_summary_final_names { │ │ │ │ = │ 'sample': 'Sample', │ │ │ │ │ 'subtype': 'Subtype Prediction', │ │ │ │ │ 'N_type': 'N: type prediction', │ │ │ │ │ 'N_top_accession': 'N: top match │ │ │ │ accession', │ │ │ │ │ 'N_virus_name': 'N: top match │ │ │ │ virus name', │ │ │ │ │ 'N_top_host': 'N: top match │ │ │ │ host', │ │ │ │ │ 'N_top_date': 'N: top match │ │ │ │ collection date', │ │ │ │ │ 'N_top_country': 'N: top match │ │ │ │ country', │ │ │ │ │ 'N_top_pident': 'N: top match │ │ │ │ BLASTN % identity', │ │ │ │ │ 'N_top_align_length': 'N: top │ │ │ │ match BLASTN alignment length', │ │ │ │ │ ... +24 │ │ │ │ } │ │ │ │ Tuple = typing.Tuple │ │ │ │ write_excel = │ │ │ ╰──────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /usr/local/lib/python3.9/site-packages/click/core.py:829 in __call__ │ │ │ │ 826 │ │ │ 827 │ def __call__(self, *args, **kwargs): │ │ 828 │ │ """Alias for :meth:`main`.""" │ │ ❱ 829 │ │ return self.main(*args, **kwargs) │ │ 830 │ │ 831 │ │ 832 class Command(BaseCommand): │ │ │ │ ╭───────── locals ──────────╮ │ │ │ args = () │ │ │ │ kwargs = {} │ │ │ │ self = │ │ │ ╰───────────────────────────╯ │ │ │ │ /usr/local/lib/python3.9/site-packages/click/core.py:782 in main │ │ │ │ 779 │ │ try: │ │ 780 │ │ │ try: │ │ 781 │ │ │ │ with self.make_context(prog_name, args, **extra) as c │ │ ❱ 782 │ │ │ │ │ rv = self.invoke(ctx) │ │ 783 │ │ │ │ │ if not standalone_mode: │ │ 784 │ │ │ │ │ │ return rv │ │ 785 │ │ │ │ │ # it's not safe to `ctx.exit(rv)` here! │ │ │ │ ╭──────────────────────────── locals ─────────────────────────────╮ │ │ │ args = [] │ │ │ │ complete_var = None │ │ │ │ ctx = │ │ │ │ extra = {} │ │ │ │ prog_name = 'parse_influenza_blast_results.py' │ │ │ │ self = │ │ │ │ standalone_mode = True │ │ │ ╰─────────────────────────────────────────────────────────────────╯ │ │ │ │ /usr/local/lib/python3.9/site-packages/click/core.py:1066 in invoke │ │ │ │ 1063 │ │ """ │ │ 1064 │ │ _maybe_show_deprecated_notice(self) │ │ 1065 │ │ if self.callback is not None: │ │ ❱ 1066 │ │ │ return ctx.invoke(self.callback, **ctx.params) │ │ 1067 │ │ 1068 │ │ 1069 class MultiCommand(Command): │ │ │ │ ╭─────────────────────── locals ───────────────────────╮ │ │ │ ctx = │ │ │ │ self = │ │ │ ╰──────────────────────────────────────────────────────╯ │ │ │ │ /usr/local/lib/python3.9/site-packages/click/core.py:610 in invoke │ │ │ │ 607 │ │ args = args[2:] │ │ 608 │ │ with augment_usage_errors(self): │ │ 609 │ │ │ with ctx: │ │ ❱ 610 │ │ │ │ return callback(*args, **kwargs) │ │ 611 │ │ │ 612 │ def forward(*args, **kwargs): # noqa: B902 │ │ 613 │ │ """Similar to :meth:`invoke` but fills in default keyword │ │ │ │ ╭────────────────────────── locals ───────────────────────────╮ │ │ │ args = () │ │ │ │ callback = │ │ │ │ ctx = │ │ │ │ kwargs = { │ │ │ │ │ 'threads': 1, │ │ │ │ │ 'flu_metadata': 'genomeset.dat.gz', │ │ │ │ │ 'excel_report': 'iav-subtyping-report.xlsx', │ │ │ │ │ 'pident_threshold': 0.85, │ │ │ │ │ 'blast_results': ( │ │ │ │ │ │ 'BPC.blastn.txt', │ │ │ │ │ │ 'APC1.blastn.txt', │ │ │ │ │ │ 'APC2.blastn.txt', │ │ │ │ │ │ 'L1852.blastn.txt' │ │ │ │ │ ), │ │ │ │ │ 'top': 3, │ │ │ │ │ 'min_aln_length': 50 │ │ │ │ } │ │ │ │ self = │ │ │ ╰─────────────────────────────────────────────────────────────╯ │ │ │ │ /sample/tmp-tf-ton01/workspace/nf-iav-illumina/bin/parse_influenza_blast_res │ │ ults.py:400 in report │ │ │ │ 397 │ │ │ f'Got {len(results)} async parsing results. Merging into r │ │ "{excel_report}".' │ │ 398 │ │ ) │ │ 399 │ else: │ │ ❱ 400 │ │ results = [parse_blast_result(blast_result, df_md, regex_subty │ │ top=top, pident_threshold=pident_threshold, min_aln_length=min_aln_len │ │ blast_result in blast_results] │ │ 401 │ dfs_blast = [] │ │ 402 │ all_subtype_results = {} │ │ 403 │ for parsed_result in results: │ │ │ │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │ │ │ blast_results = ( │ │ │ │ │ 'BPC.blastn.txt', │ │ │ │ │ 'APC1.blastn.txt', │ │ │ │ │ 'APC2.blastn.txt', │ │ │ │ │ 'L1852.blastn.txt' │ │ │ │ ) │ │ │ │ df_md = │ accession host segment ... age gender │ │ │ │ group_id │ │ │ │ 0 M14880 Human 1 ... NaN NaN │ │ │ │ 14656 │ │ │ │ 1 AF101982 Human 2 ... NaN NaN │ │ │ │ 14656 │ │ │ │ 2 AF102017 Human 3 ... NaN NaN │ │ │ │ 14656 │ │ │ │ 3 K00423 Human 4 ... NaN NaN │ │ │ │ 14656 │ │ │ │ 4 K01395 Human 5 ... NaN NaN │ │ │ │ 14656 │ │ │ │ ... ... ... ... ... ... ... │ │ │ │ ... │ │ │ │ 536686 MT421224 Avian 4 ... NaN NaN │ │ │ │ 1068967 │ │ │ │ 536687 MT421225 Avian 5 ... NaN NaN │ │ │ │ 1068967 │ │ │ │ 536688 MT421226 Avian 6 ... NaN NaN │ │ │ │ 1068967 │ │ │ │ 536689 MT421227 Avian 7 ... NaN NaN │ │ │ │ 1068967 │ │ │ │ 536690 MT421228 Avian 8 ... NaN NaN │ │ │ │ 1068967 │ │ │ │ │ │ │ │ [536691 rows x 11 columns] │ │ │ │ excel_report = 'iav-subtyping-report.xlsx' │ │ │ │ flu_metadata = 'genomeset.dat.gz' │ │ │ │ install = │ │ │ │ md_cols = [ │ │ │ │ │ ('accession', ), │ │ │ │ │ ('host', 'category'), │ │ │ │ │ ('segment', 'category'), │ │ │ │ │ ('subtype', 'str'), │ │ │ │ │ ('country', 'category'), │ │ │ │ │ ('date', 'category'), │ │ │ │ │ ('seq_length', 'uint16'), │ │ │ │ │ ('virus_name', 'category'), │ │ │ │ │ ('age', 'category'), │ │ │ │ │ ('gender', 'category'), │ │ │ │ │ ... +1 │ │ │ │ ] │ │ │ │ min_aln_length = 50 │ │ │ │ pident_threshold = 0.85 │ │ │ │ regex_subtype_pattern = '\\((H\\d+N\\d+|H9N2|H1N1|H5N1|H2N2|H3N2|H5N6|H… │ │ │ │ threads = 1 │ │ │ │ top = 3 │ │ │ │ unique_subtypes = array(['H9N2', 'H1N1', 'H5N1', 'H2N2', 'H3N2', │ │ │ │ 'H5N6', 'H7N3', 'H4N6', │ │ │ │ │ 'H12N5', 'H1N2', 'H2N3', 'H2N9', 'H2N5', │ │ │ │ 'H2N7', 'H13N2', 'H2N8', │ │ │ │ │ 'H2N1', 'H2N4', 'H6N2', 'H6N5', 'H6N8', │ │ │ │ 'H3N8', 'H3N6', 'H6N9', │ │ │ │ │ 'H6N6', 'H6N4', 'H6N3', 'H6N1', 'H5N2', │ │ │ │ 'H9N1', 'H10N7', 'H11N9', │ │ │ │ │ 'H5N3', 'H5N7', 'H9N5', 'H13N9', 'H1N6', │ │ │ │ 'H16N3', 'H9N6', 'H3N3', │ │ │ │ │ 'H3N1', 'H3N5', 'H4N3', 'H4N8', 'H4N2', │ │ │ │ 'H4N1', 'H8N4', 'H7N5', │ │ │ │ │ 'H7N7', 'H7N1', 'H10N3', 'H10N6', │ │ │ │ 'H10N1', 'H11N3', 'H11N2', │ │ │ │ │ 'H12N1', 'H12N4', 'mixed', 'mixed,H2', │ │ │ │ 'mixed,N7', 'H15N9', │ │ │ │ │ 'H10N9', 'H12N9', 'H10N8', 'H13N6', │ │ │ │ 'H11N6', 'H7N9', 'H4N9', │ │ │ │ │ 'H1N5', 'H5N8', 'H1N9', 'H11N1', 'H11N8', │ │ │ │ 'H7N2', 'mixed,H3', │ │ │ │ │ 'Mixed,N1', 'Mixed,H6', 'Mixed,N2', │ │ │ │ 'Mixed,H4', 'Mixed,N6', 'H7N8', │ │ │ │ │ 'H7N4', 'mixed,N1', 'mixed,N2', │ │ │ │ 'mixed,H1', 'mixed,H4', 'mixed,N6', │ │ │ │ │ 'mixed,H8', 'mixed,N4', 'mixed,H6', │ │ │ │ 'mixed,N8', 'mixed,H10', │ │ │ │ │ 'mixed,H12', 'mixed,N5', 'mixed,H5', │ │ │ │ 'mixed,N3', 'mixed,H11', │ │ │ │ │ 'mixed,H7', 'mixed, H3', 'mixed, H2', │ │ │ │ 'mixed, N3', 'mixed, H1', │ │ │ │ │ 'mixed, N1', 'mixed, N2', 'mixed,N9', │ │ │ │ 'mixed,H9', 'mixed,H16', │ │ │ │ │ 'H9N7', 'H1', 'H13N8', 'H7N6', │ │ │ │ 'mixed.H3', 'H3N7', 'H3N4', 'H1N2v', │ │ │ │ │ 'H3N2v', 'H14N3', 'H1N3', 'H14N5', │ │ │ │ 'mixed.N3', 'H12N3', 'H12N7', │ │ │ │ │ 'H3N9', 'H1N8', 'H5N9', 'H4N5', 'H4N7', │ │ │ │ 'H4N4', 'H5N5', 'Mixed,N8', │ │ │ │ │ 'H10N4', 'H10N2', 'H1N4', 'H12N8', │ │ │ │ 'H12N6', 'H10N5', 'Mixed,H3', │ │ │ │ │ 'H5N4', 'H8N8', 'H15N4', 'H9N9', │ │ │ │ 'H17N10', 'H2N6', 'mixed,H13', │ │ │ │ │ 'H11N7', 'H14N6', 'H14N8', 'H8N2', │ │ │ │ 'H1N7', 'H6N7', 'H6N1,H6', │ │ │ │ │ 'H9N8', 'H11N5', 'H14N2', 'H18N11', │ │ │ │ 'H12N2', 'H8N1', 'mixed,H14', │ │ │ │ │ 'H9N3', 'H14N7', 'H13N3', 'H14N4', │ │ │ │ 'H3N6,H3', 'H15N5', 'H9N4', │ │ │ │ │ 'H01N2', 'h5n1', 'H11N9/N2', 'mixed.N2'], │ │ │ │ dtype=object) │ │ │ ╰──────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /sample/tmp-tf-ton01/workspace/nf-iav-illumina/bin/parse_influenza_blast_res │ │ ults.py:400 in │ │ │ │ 397 │ │ │ f'Got {len(results)} async parsing results. Merging into r │ │ "{excel_report}".' │ │ 398 │ │ ) │ │ 399 │ else: │ │ ❱ 400 │ │ results = [parse_blast_result(blast_result, df_md, regex_subty │ │ top=top, pident_threshold=pident_threshold, min_aln_length=min_aln_len │ │ blast_result in blast_results] │ │ 401 │ dfs_blast = [] │ │ 402 │ all_subtype_results = {} │ │ 403 │ for parsed_result in results: │ │ │ │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │ │ │ .0 = │ │ │ │ blast_result = 'BPC.blastn.txt' │ │ │ │ df_md = │ accession host segment ... age gender │ │ │ │ group_id │ │ │ │ 0 M14880 Human 1 ... NaN NaN │ │ │ │ 14656 │ │ │ │ 1 AF101982 Human 2 ... NaN NaN │ │ │ │ 14656 │ │ │ │ 2 AF102017 Human 3 ... NaN NaN │ │ │ │ 14656 │ │ │ │ 3 K00423 Human 4 ... NaN NaN │ │ │ │ 14656 │ │ │ │ 4 K01395 Human 5 ... NaN NaN │ │ │ │ 14656 │ │ │ │ ... ... ... ... ... ... ... │ │ │ │ ... │ │ │ │ 536686 MT421224 Avian 4 ... NaN NaN │ │ │ │ 1068967 │ │ │ │ 536687 MT421225 Avian 5 ... NaN NaN │ │ │ │ 1068967 │ │ │ │ 536688 MT421226 Avian 6 ... NaN NaN │ │ │ │ 1068967 │ │ │ │ 536689 MT421227 Avian 7 ... NaN NaN │ │ │ │ 1068967 │ │ │ │ 536690 MT421228 Avian 8 ... NaN NaN │ │ │ │ 1068967 │ │ │ │ │ │ │ │ [536691 rows x 11 columns] │ │ │ │ min_aln_length = 50 │ │ │ │ pident_threshold = 0.85 │ │ │ │ regex_subtype_pattern = '\\((H\\d+N\\d+|H9N2|H1N1|H5N1|H2N2|H3N2|H5N6|H… │ │ │ │ top = 3 │ │ │ ╰──────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /sample/tmp-tf-ton01/workspace/nf-iav-illumina/bin/parse_influenza_blast_res │ │ ults.py:235 in parse_blast_result │ │ │ │ 232 │ H_results = None │ │ 233 │ N_results = None │ │ 234 │ if "4" in df_merge.index: │ │ ❱ 235 │ │ H_results = find_h_or_n_type(df_merge, "4") │ │ 236 │ │ subtype_results_summary.update(H_results) │ │ 237 │ if "6" in df_merge.index: │ │ 238 │ │ N_results = find_h_or_n_type(df_merge, "6") │ │ │ │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │ │ │ blast_result = 'BPC.blastn.txt' │ │ │ │ df = │ qaccver ... │ │ │ │ stitle │ │ │ │ 0 BPC_1 ... │ │ │ │ gi|1000790953|gb|CY201759|Influenza B virus │ │ │ │ (B... │ │ │ │ 1 BPC_1 ... │ │ │ │ gi|1000787179|gb|CY200183|Influenza B virus │ │ │ │ (B... │ │ │ │ 2 BPC_1 ... │ │ │ │ gi|1036637226|gb|KX269976|Influenza B virus │ │ │ │ (B... │ │ │ │ 3 BPC_1 ... │ │ │ │ gi|1320959216|gb|CY264039|Influenza B virus │ │ │ │ (B... │ │ │ │ 4 BPC_1 ... │ │ │ │ gi|1000790089|gb|CY201399|Influenza B virus │ │ │ │ (B... │ │ │ │ ... ... ... │ │ │ │ ... │ │ │ │ 112080 BPC_8 ... │ │ │ │ gi|1669281089|gb|MK969248|Influenza B virus │ │ │ │ (B... │ │ │ │ 112081 BPC_8 ... │ │ │ │ gi|1241162262|gb|CY243283|Influenza B virus │ │ │ │ (B... │ │ │ │ 112082 BPC_8 ... │ │ │ │ gi|262357481|gb|GU135839|Influenza B virus │ │ │ │ (B/... │ │ │ │ 112083 BPC_8 ... │ │ │ │ gi|1662274797|gb|MK961391|Influenza B virus │ │ │ │ (B... │ │ │ │ 112084 BPC_8 ... │ │ │ │ gi|7141162|gb|AF217214|Influenza B virus │ │ │ │ isola... │ │ │ │ │ │ │ │ [112085 rows x 16 columns] │ │ │ │ df_merge = │ │ │ qaccver │ │ │ │ saccver ... gender group_id │ │ │ │ sample_segment │ │ │ │ ... │ │ │ │ 1 BPC_1 │ │ │ │ gi|1000790953|gb|CY201759|Influenza ... │ │ │ │ NaN 1033250 │ │ │ │ 1 BPC_1 │ │ │ │ gi|1000787179|gb|CY200183|Influenza ... │ │ │ │ NaN 1033053 │ │ │ │ 1 BPC_1 │ │ │ │ gi|1036637226|gb|KX269976|Influenza ... │ │ │ │ NaN 1041500 │ │ │ │ 1 BPC_1 │ │ │ │ gi|1320959216|gb|CY264039|Influenza ... │ │ │ │ NaN 1049508 │ │ │ │ 1 BPC_1 │ │ │ │ gi|1000790089|gb|CY201399|Influenza ... │ │ │ │ NaN 1033205 │ │ │ │ ... ... │ │ │ │ ... ... ... ... │ │ │ │ 8 BPC_8 │ │ │ │ gi|1669281089|gb|MK969248|Influenza ... │ │ │ │ NaN NaN │ │ │ │ 8 BPC_8 │ │ │ │ gi|1241162262|gb|CY243283|Influenza ... │ │ │ │ NaN NaN │ │ │ │ 8 BPC_8 │ │ │ │ gi|262357481|gb|GU135839|Influenza ... │ │ │ │ NaN NaN │ │ │ │ 8 BPC_8 │ │ │ │ gi|1662274797|gb|MK961391|Influenza ... │ │ │ │ NaN NaN │ │ │ │ 8 BPC_8 │ │ │ │ gi|7141162|gb|AF217214|Influenza ... NaN │ │ │ │ NaN │ │ │ │ │ │ │ │ [111369 rows x 29 columns] │ │ │ │ H_results = None │ │ │ │ min_aln_length = 50 │ │ │ │ N_results = None │ │ │ │ pident_threshold = 0.85 │ │ │ │ regex_subtype_pattern = '\\((H\\d+N\\d+|H9N2|H1N1|H5N1|H2N2|H3N2|H5N6… │ │ │ │ sample_name = 'BPC' │ │ │ │ segments = ['1', '2', '3', '4', '5', '6', '7', '8'] │ │ │ │ Categories (8, object): ['1', '2', '3', '4', │ │ │ │ '5', '6', '7', '8'] │ │ │ │ subtype_results_summary = {} │ │ │ │ top = 3 │ │ │ ╰──────────────────────────────────────────────────────────────────────────╯ │ │ │ │ /sample/tmp-tf-ton01/workspace/nf-iav-illumina/bin/parse_influenza_blast_res │ │ ults.py:300 in find_h_or_n_type │ │ │ │ 297 │ │ type_to_count[x[type_name]] += x["count"] │ │ 298 │ type_to_count = [(h, c) for h, c in type_to_count.items()] │ │ 299 │ type_to_count.sort(key=lambda x: x[1], reverse=True) │ │ ❱ 300 │ top_type, top_type_count = type_to_count[0] │ │ 301 │ total_count = type_counts.sum() │ │ 302 │ logging.info( │ │ 303 │ │ f"{h_or_n}{top_type} n={top_type_count}/{total_count} ({top_ty │ │ total_count:.1%})" │ │ │ │ ╭───────────────────────────────── locals ─────────────────────────────────╮ │ │ │ df_merge = │ │ │ qaccver │ │ │ │ saccver ... gender group_id │ │ │ │ sample_segment │ │ │ │ ... │ │ │ │ 1 BPC_1 │ │ │ │ gi|1000790953|gb|CY201759|Influenza ... NaN │ │ │ │ 1033250 │ │ │ │ 1 BPC_1 │ │ │ │ gi|1000787179|gb|CY200183|Influenza ... NaN │ │ │ │ 1033053 │ │ │ │ 1 BPC_1 │ │ │ │ gi|1036637226|gb|KX269976|Influenza ... NaN │ │ │ │ 1041500 │ │ │ │ 1 BPC_1 │ │ │ │ gi|1320959216|gb|CY264039|Influenza ... NaN │ │ │ │ 1049508 │ │ │ │ 1 BPC_1 │ │ │ │ gi|1000790089|gb|CY201399|Influenza ... NaN │ │ │ │ 1033205 │ │ │ │ ... ... │ │ │ │ ... ... ... ... │ │ │ │ 8 BPC_8 │ │ │ │ gi|1669281089|gb|MK969248|Influenza ... NaN │ │ │ │ NaN │ │ │ │ 8 BPC_8 │ │ │ │ gi|1241162262|gb|CY243283|Influenza ... NaN │ │ │ │ NaN │ │ │ │ 8 BPC_8 │ │ │ │ gi|262357481|gb|GU135839|Influenza ... NaN │ │ │ │ NaN │ │ │ │ 8 BPC_8 │ │ │ │ gi|1662274797|gb|MK961391|Influenza ... NaN │ │ │ │ NaN │ │ │ │ 8 BPC_8 │ │ │ │ gi|7141162|gb|AF217214|Influenza ... NaN NaN │ │ │ │ │ │ │ │ [111369 rows x 29 columns] │ │ │ │ df_segment = │ │ │ qaccver │ │ │ │ saccver ... gender group_id │ │ │ │ sample_segment │ │ │ │ ... │ │ │ │ 4 BPC_4 │ │ │ │ gi|910286899|gb|CY191483|Influenza ... NaN │ │ │ │ 1029169 │ │ │ │ 4 BPC_4 │ │ │ │ gi|777862460|gb|KP864253|Influenza ... NaN │ │ │ │ NaN │ │ │ │ 4 BPC_4 │ │ │ │ gi|590123197|gb|KJ532255|Influenza ... NaN │ │ │ │ NaN │ │ │ │ 4 BPC_4 │ │ │ │ gi|590123155|gb|KJ532238|Influenza ... NaN │ │ │ │ NaN │ │ │ │ 4 BPC_4 │ │ │ │ gi|590123148|gb|KJ532235|Influenza ... NaN │ │ │ │ NaN │ │ │ │ ... ... │ │ │ │ ... ... ... ... │ │ │ │ 4 BPC_4 │ │ │ │ gi|1273499846|gb|MG203968|Influenza ... NaN │ │ │ │ NaN │ │ │ │ 4 BPC_4 │ │ │ │ gi|1273499984|gb|MG204024|Influenza ... NaN │ │ │ │ NaN │ │ │ │ 4 BPC_4 │ │ │ │ gi|1815499672|gb|MT123912|Influenza ... NaN │ │ │ │ NaN │ │ │ │ 4 BPC_4 │ │ │ │ gi|301033206|gb|HM747050|Influenza ... NaN │ │ │ │ NaN │ │ │ │ 4 BPC_4 │ │ │ │ gi|1435110668|gb|MH684325|Influenza ... NaN │ │ │ │ NaN │ │ │ │ │ │ │ │ [24251 rows x 29 columns] │ │ │ │ df_type_counts = Empty DataFrame │ │ │ │ Columns: [H_type, count, subtype] │ │ │ │ Index: [] │ │ │ │ h_or_n = 'H' │ │ │ │ seg = '4' │ │ │ │ type_counts = Series([], Name: subtype, dtype: int64) │ │ │ │ type_name = 'H_type' │ │ │ │ type_to_count = [] │ │ │ ╰──────────────────────────────────────────────────────────────────────────╯ │ ╰──────────────────────────────────────────────────────────────────────────────╯ IndexError: list index out of range ```

BPC.blastn.txt

I hope the attached information is helpful.

kdl480 commented 1 year ago

Hi! I was also wanting to use this pipeline for Influenza B and received the same error. Was anyone able to get this to work?

peterk87 commented 1 year ago

Hi @kdl480, we currently don't do any flu B sequencing or analysis in our lab, only influenza A. But I don't think it would be that difficult to add better support for flu B to the workflow.

peterk87 / nf-flu

Influenza B specimen causes `parse_influenza_blast_results.py` to crash #7