IndexError: list index out of range

pedres commented 10 months ago

Hi,

I have installed ccmetagen in a conda environment this week with all its dependencies and run it with

kma -ipe ss1_R1.fastq.gz ss1_R2.fastq.gz -o ss1_R1.fastq.gz -t_db refseq_bf/refseq_bf -t 64 -1t1 -mem_mode -and -apm f

CCMetagen.py -i $LUSTRE/01*/ss1_R1.fastq.gz.res -o $LUSTRE/ss1 -r RefSeq

KMA worked without problems but CCMetagen gives an error:

/mnt/netapp2/Store_uni/home/uvi/ba/jdm/conda/envs/ccmetagen/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py:243: UserWarning: taxid 85643 was translated into 164330 warnings.warn("taxid %s was translated into %s" %(taxid, merged_conversion[taxid])) /mnt/netapp2/Store_uni/home/uvi/ba/jdm/conda/envs/ccmetagen/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py:243: UserWarning: taxid 46170 was translated into 1280 warnings.warn("taxid %s was translated into %s" %(taxid, merged_conversion[taxid])) Traceback (most recent call last): ss1_R1.fastq.gz.txt

File "/mnt/netapp2/Store_uni/home/uvi/ba/jdm/conda/envs/ccmetagen/bin/CCMetagen.py", line 274, in df = fParseKMA.populate_w_tax(df, ref_database, st, gt, ft, ot, ct, pt) File "/mnt/netapp2/Store_uni/home/uvi/ba/jdm/conda/envs/ccmetagen/lib/python3.6/site-packages/ccmetagen/fParseKMA.py", line 81, in populate_w_tax species = split_match[6] + " " + split_match[8] IndexError: list index out of range

ss1_R1.fastq.gz.txt

Thnaks a lot for your help

vrmarcelino commented 10 months ago

Hi!

Thanks for reporting this – the problem seems to be one match to a plasmid sequence (NZ_CP026617.1|taxid|2079596 plasmid1).

If you replace "plasmid1" with "Acinetobacter sp." (the species from where this plasmid was found according to the NCBI record) it should work.

Are you having this issue with other files as well?

We'll need to write a patch for CCMetagen to detect these cases, but that will probably only be released next year. If this is just happening with this one file, manually adding the species will do the trick and won't affect the results.

pedres commented 10 months ago

Thanks for the fast answer. I understand that I have to do that in the res file. I have more files, so I would like to know how to find what taxa is doing the error. I run CCMetagen with a small set of my samples and the provided RefSeq_BF database. In fact, I am preparing the NCBI nt database to run all my samples (22 samples). Should I have to wait to the next release of CCMEtagen?

vrmarcelino commented 10 months ago

Hi!

This shouldn't be a problem when using the NCBI nt database. To detect the problematic taxon, you need to edit the file "fParseKMA.py" and add a 'print (index)' after line 80. It should look like this:

            elif ref_database == "RefSeq":
            split_match = re.split (r'(\|| )', index)
            qiden = row['Query_Identity']
            match_info.TaxId = int(split_match[4])
            print(index)
            species = split_match[6] + " " + split_match[8]
            match_info.Lineage = species
            # include info from NCBI:
            match_info = fNCBItax.lineage_extractor(match_info.TaxId, match_info, taxfile)

If you prefer I can send you the updated file via email.

pedres commented 10 months ago

Hi,

I will do it. Thanks for the explanation.

Manuel

De: VR Marcelino @.> Enviado: martes, 12 de diciembre de 2023 4:32 Para: vrmarcelino/CCMetagen @.> Cc: Manuel Aira Vieira @.>; Author @.> Asunto: Re: [vrmarcelino/CCMetagen] IndexError: list index out of range (Issue #55)

Hi!

This shouldn't be a problem when using the NCBI nt database. To detect the problematic taxon, you need to edit the file "fParseKMA.py" and add a 'print (index)' after line 80. It should look like this:

        elif ref_database == "RefSeq":
        split_match = re.split (r'(\|| )', index)
        qiden = row['Query_Identity']
        match_info.TaxId = int(split_match[4])
        print(index)
        species = split_match[6] + " " + split_match[8]
        match_info.Lineage = species
        # include info from NCBI:
        match_info = fNCBItax.lineage_extractor(match_info.TaxId, match_info, taxfile)

If you prefer I can send you the updated file via email.

— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/vrmarcelino/CCMetagen/issues/55*issuecomment-1851244965__;Iw!!D9dNQwwGXtA!Rs4QscpEt78lqzaokwZyjswVoH8O4k0w4KGk-RMkeu1WZvJPMv6HtuFYE7gpJahF7HtfPDwpYweefaoQcPoIaQ$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AGJ25Z5GBGYT3QUF5NNAX33YI7F5FAVCNFSM6AAAAABAONIFH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJRGI2DIOJWGU__;!!D9dNQwwGXtA!Rs4QscpEt78lqzaokwZyjswVoH8O4k0w4KGk-RMkeu1WZvJPMv6HtuFYE7gpJahF7HtfPDwpYweefaqM9q_gRQ$. You are receiving this because you authored the thread.Message ID: @.***>

vrmarcelino / CCMetagen

IndexError: list index out of range #55