Closed pedres closed 8 months ago
Hi!
Thanks for reporting this – the problem seems to be one match to a plasmid sequence (NZ_CP026617.1|taxid|2079596 plasmid1).
If you replace "plasmid1" with "Acinetobacter sp." (the species from where this plasmid was found according to the NCBI record) it should work.
Are you having this issue with other files as well?
We'll need to write a patch for CCMetagen to detect these cases, but that will probably only be released next year. If this is just happening with this one file, manually adding the species will do the trick and won't affect the results.
Thanks for the fast answer. I understand that I have to do that in the res file. I have more files, so I would like to know how to find what taxa is doing the error. I run CCMetagen with a small set of my samples and the provided RefSeq_BF database. In fact, I am preparing the NCBI nt database to run all my samples (22 samples). Should I have to wait to the next release of CCMEtagen?
Hi!
This shouldn't be a problem when using the NCBI nt database. To detect the problematic taxon, you need to edit the file "fParseKMA.py" and add a 'print (index)' after line 80. It should look like this:
elif ref_database == "RefSeq":
split_match = re.split (r'(\|| )', index)
qiden = row['Query_Identity']
match_info.TaxId = int(split_match[4])
print(index)
species = split_match[6] + " " + split_match[8]
match_info.Lineage = species
# include info from NCBI:
match_info = fNCBItax.lineage_extractor(match_info.TaxId, match_info, taxfile)
If you prefer I can send you the updated file via email.
Hi,
I will do it. Thanks for the explanation.
Manuel
De: VR Marcelino @.> Enviado: martes, 12 de diciembre de 2023 4:32 Para: vrmarcelino/CCMetagen @.> Cc: Manuel Aira Vieira @.>; Author @.> Asunto: Re: [vrmarcelino/CCMetagen] IndexError: list index out of range (Issue #55)
Hi!
This shouldn't be a problem when using the NCBI nt database. To detect the problematic taxon, you need to edit the file "fParseKMA.py" and add a 'print (index)' after line 80. It should look like this:
elif ref_database == "RefSeq":
split_match = re.split (r'(\|| )', index)
qiden = row['Query_Identity']
match_info.TaxId = int(split_match[4])
print(index)
species = split_match[6] + " " + split_match[8]
match_info.Lineage = species
# include info from NCBI:
match_info = fNCBItax.lineage_extractor(match_info.TaxId, match_info, taxfile)
If you prefer I can send you the updated file via email.
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/vrmarcelino/CCMetagen/issues/55*issuecomment-1851244965__;Iw!!D9dNQwwGXtA!Rs4QscpEt78lqzaokwZyjswVoH8O4k0w4KGk-RMkeu1WZvJPMv6HtuFYE7gpJahF7HtfPDwpYweefaoQcPoIaQ$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AGJ25Z5GBGYT3QUF5NNAX33YI7F5FAVCNFSM6AAAAABAONIFH6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJRGI2DIOJWGU__;!!D9dNQwwGXtA!Rs4QscpEt78lqzaokwZyjswVoH8O4k0w4KGk-RMkeu1WZvJPMv6HtuFYE7gpJahF7HtfPDwpYweefaqM9q_gRQ$. You are receiving this because you authored the thread.Message ID: @.***>
Hi,
I have installed ccmetagen in a conda environment this week with all its dependencies and run it with
kma -ipe ss1_R1.fastq.gz ss1_R2.fastq.gz -o ss1_R1.fastq.gz -t_db refseq_bf/refseq_bf -t 64 -1t1 -mem_mode -and -apm f
CCMetagen.py -i $LUSTRE/01*/ss1_R1.fastq.gz.res -o $LUSTRE/ss1 -r RefSeq
KMA worked without problems but CCMetagen gives an error:
/mnt/netapp2/Store_uni/home/uvi/ba/jdm/conda/envs/ccmetagen/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py:243: UserWarning: taxid 85643 was translated into 164330 warnings.warn("taxid %s was translated into %s" %(taxid, merged_conversion[taxid])) /mnt/netapp2/Store_uni/home/uvi/ba/jdm/conda/envs/ccmetagen/lib/python3.6/site-packages/ete3/ncbi_taxonomy/ncbiquery.py:243: UserWarning: taxid 46170 was translated into 1280 warnings.warn("taxid %s was translated into %s" %(taxid, merged_conversion[taxid])) Traceback (most recent call last): ss1_R1.fastq.gz.txt
File "/mnt/netapp2/Store_uni/home/uvi/ba/jdm/conda/envs/ccmetagen/bin/CCMetagen.py", line 274, in
df = fParseKMA.populate_w_tax(df, ref_database, st, gt, ft, ot, ct, pt)
File "/mnt/netapp2/Store_uni/home/uvi/ba/jdm/conda/envs/ccmetagen/lib/python3.6/site-packages/ccmetagen/fParseKMA.py", line 81, in populate_w_tax
species = split_match[6] + " " + split_match[8]
IndexError: list index out of range
ss1_R1.fastq.gz.txt
Thnaks a lot for your help