katholt / srst2

Short Read Sequence Typing for Bacterial Pathogens
Other
123 stars 65 forks source link

Using new ARG-ANNOT v3 database #94

Closed hjb60 closed 6 years ago

hjb60 commented 6 years ago

I would like to use the newest version of the ARG-ANNOT database but on running the script I am not getting the full genes output file. The error file contains the following:

09/20/2017 20:47:42 Processing SAMtools pileup... Traceback (most recent call last): File "/software/pathogen/external/apps/usr/local/Python-2.7.13/bin/srst2", line 11, in load_entry_point('srst2==0.2.0', 'console_scripts', 'srst2')() File "/software/pathogen/external/apps/usr/local/Python-2.7.13/lib/python2.7/site-packages/srst2/srst2.py", line 1729, in main db_reports, db_results = run_srst2(args,fileSets,args.gene_db,"genes") File "/software/pathogen/external/apps/usr/local/Python-2.7.13/lib/python2.7/site-packages/srst2/srst2.py", line 1264, in run_srst2 db_results_list, fasta) File "/software/pathogen/external/apps/usr/local/Python-2.7.13/lib/python2.7/site-packages/srst2/srst2.py", line 1327, in process_fasta_db results,gene_list, db_report, cluster_symbols, max_mismatch) File "/software/pathogen/external/apps/usr/local/Python-2.7.13/lib/python2.7/site-packages/srst2/srst2.py", line 1422, in map_fileSet_to_db read_pileup_data(pileup_file, size, args.prob_err) File "/software/pathogen/external/apps/usr/local/Python-2.7.13/lib/python2.7/site-packages/srst2/srst2.py", line 337, in read_pileup_data allele_size = size[allele] KeyError: 'AGly-Aac3-IIa:X51534:91-951:861'

Is there some additional formatting I should be doing before trying to use it? Sorry if this is obvious - this area is not my forte!

katholt commented 6 years ago

What command are you attempting to run?

hjb60 commented 6 years ago

Hi,

Resistance gene detection so:

srst2 --input_pe strainA_1.fastq.gz strainA_2.fastq.gz --output strainA_test --log --gene_db resistance.fasta

Thanks

Hayley


From: Kat Holt [notifications@github.com] Sent: 29 September 2017 18:16 To: katholt/srst2 Cc: Hayley Wilson; Author Subject: Re: [katholt/srst2] Using new ARG-ANNOT v3 database (#94)

What command are you attempting to run?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/katholt/srst2/issues/94#issuecomment-333185033, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AQgyZtqhAMy7rp0cgz_XUbj7jYa4Jkdlks5snSXagaJpZM4Pm8LW.

wanyuac commented 6 years ago

@hjb60 May I ask where did you get the resistance.fasta from? I feel the sequence header "AGly-Aac3-IIa:X51534:91-951:861" does not look like the one in the ARG-ANNOT v3 database "(AGly)Aac3-IIa:X51534:91-951:861" or in our curated version of the same database "203__Aac3-IIa_AGlyAac3-IIa882 no;no;Aac3-IIa;AGly;X51534;91-951;861".

SRST2 extracts information from sequence headers following a specific format as aforementioned (also please refer to Generating SRST2-compatible clustered database from raw sequences for more details). An error of unknown keys arises when this requirement is not fullfilled.

You may want to compare your resistance database with the formal release of the ARG-ANNOT v3 database, or try our curated version, which has already been tested on SRST2.

hjb60 commented 6 years ago

Thanks for the advice - I had downloaded the fasta file of the resistance database from here: http://en.mediterranee-infection.com/arkotheque/client/ihumed/_depot_arko/articles/1425/argannot-aa-v3-march2017_doc.fasta When it initially didn't work I tried removing the (). I will try using the link you have provided and hopefully that will work.

Many thanks for the response Hayley


From: Yu Wan [notifications@github.com] Sent: 02 October 2017 12:07 To: katholt/srst2 Cc: Hayley Wilson; Author Subject: Re: [katholt/srst2] Using new ARG-ANNOT v3 database (#94)

May I ask where did you get the resistance.fasta from? I feel the sequence header "AGly-Aac3-IIa:X51534:91-951:861" does not look like the one in the ARG-ANNOT v3 database "(AGly)Aac3-IIa:X51534:91-951:861" or in our curated version of the same database "203__Aac3-IIa_AGlyAac3-IIa882 no;no;Aac3-IIa;AGly;X51534;91-951;861".

SRST2 extracts information from sequence headers following a specific format as aforementioned (also please refer to Generating SRST2-compatible clustered database from raw sequenceshttps://github.com/katholt/srst2#generating-srst2-compatible-clustered-database-from-raw-sequences for more details). An error of unknown keys arises when this requirement is not fullfilled.

You may want to compare your resistance database with the formal release of the ARG-ANNOT v3 databasehttp://en.mediterranee-infection.com/arkotheque/client/ihumed/_depot_arko/articles/1424/arg-annot-nt-v3-march2017_doc.fasta, or try our curated versionhttps://github.com/katholt/srst2/blob/master/data/ARGannot_r2.fasta, which has already been tested on SRST2.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/katholt/srst2/issues/94#issuecomment-333504410, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AQgyZj5OtLXLHUgzNS-W2waFvS0_Kbutks5soMQOgaJpZM4Pm8LW.

katholt commented 6 years ago

Unless you have a specific reason to do otherwise, I would suggest using our pre-formatted version of this resistance database (ARGannot_r2.fasta) which is in the /data directory