katholt / srst2

Short Read Sequence Typing for Bacterial Pathogens
Other
125 stars 65 forks source link

Different naming from Resfinder CGE db #127

Closed deearahman closed 4 years ago

deearahman commented 4 years ago

Hi,

I noted that some of the genes naming fro Resfinder db were changed. For example, I'm working on this sample that has mecC gene. The output from SRST2 is a mecA2. Is there a way that I can quickly identify the this mecA2 correspond to mecC? Or are there instruction that I can build an updated Resfinder database (with same naming as Resfinder) and use it with SRST2. I've been using SRST2 with ARGannot db for most of our work but need to change a little with our clinical work analysis.

Thanks in advance!

Dee

katholt commented 4 years ago

Hi Dee,

We don't use ResFinder in our lab, so don't keep this updated in SRST2. We provided a formatted version of the ResFinder DB with the initial release of SRST2 for user convenience but don't maintain this. If you would like to format the current ResFinder sequence DB for use with SRST2, please follow these instructions: https://github.com/katholt/srst2#generating-srst2-compatible-clustered-database-from-raw-sequences

As far as checking whether the 'mecA2' sequence in your current SRST2 output is the same as mecC in a different database, I suggest you just pull out the sequence labelled 'mecA2' from the fasta file you ran SRST2 with (which I take it is the version of ResFinder we provide in the SRST2 repository, i.e. https://github.com/katholt/srst2/blob/master/data/ResFinder.fasta) and BLAST this mecA2 sequence against your mecC sequence or indeed the current version of the whole ResFinder database (or CARD or whatever) to see whether its name might have been changed over the years.

Kat

katholt commented 4 years ago

Looks like a python version issue - the script was written in python2, but you are running it in python3. Kat

On 7 July 2020 at 2:34:36 am, deearahman (notifications@github.com) wrote:

Hi Kat,

Thanks for the suggestion.

I'm trying to build the database I am getting error from csv_to_gene_db.py

python3 csv_to_gene_db.py -h File "csv_to_gene_db.py", line 58 print DoError ("Where are the sequences? If they are in the table, specify which column using -s. Otherwise provide a fasta file of sequence using -f and specify which column contains sequence identifiers that match the fasta headers, using -h") ^ SyntaxError: invalid syntax

Not too sure what went wrong.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/katholt/srst2/issues/127#issuecomment-654547107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGLFHSY4IJZFR2PZDHURX3R2J3SVANCNFSM4OLWXKZQ .

deearahman commented 4 years ago

Hi Kat,

I tried replying via github but the issue is closed. i've solved the python issue. But I'm getting a list of Non-unique warnings in the .log file. Is it something I need to worried about? Or should I edit the clustered.csv file? "07/13/2020 14:35:52 Non-unique:564, dfrB5 07/13/2020 14:35:52 Non-unique:564, dfrB6 07/13/2020 14:35:52 Non-unique:564, dfrB8 07/13/2020 14:35:52 Non-unique:565, dfrB3 07/13/2020 14:35:52 Non-unique:565, dfrB3 07/13/2020 14:35:52 Non-unique:565, dfrB7 07/13/2020 14:35:52 Non-unique:229, blaIMI 07/13/2020 14:35:52 Non-unique:229, blaIMI 07/13/2020 14:35:52 Non-unique:229, blaIMI 07/13/2020 14:35:52 Non-unique:229, blaIMI"

Secondly, I tried formatting the file as the default Resfinder but the results is not exactly the same. For example, from the default Resfinder, results prints only the allele without the seqid: Sample aac(3)-Ik fusA mecA tet(38) MDB163 aac(3)-Ik fusA5 mecA2 tet(38)_3*

But from the newly formatted Resfinder database, the seqid is printed after the allele(below). Is there a way it only prints the allele without the seqid? Sample blaZ mecC2 MDB163 blaZ10_496 mecC2_1599

Thanks and regards, Dyana

On Tue, Jul 7, 2020 at 3:40 PM Kat Holt notifications@github.com wrote:

Looks like a python version issue - the script was written in python2, but you are running it in python3. Kat

On 7 July 2020 at 2:34:36 am, deearahman (notifications@github.com) wrote:

Hi Kat,

Thanks for the suggestion.

I'm trying to build the database I am getting error from csv_to_gene_db.py

python3 csv_to_gene_db.py -h File "csv_to_gene_db.py", line 58 print DoError ("Where are the sequences? If they are in the table, specify which column using -s. Otherwise provide a fasta file of sequence using -f and specify which column contains sequence identifiers that match the fasta headers, using -h") ^ SyntaxError: invalid syntax

Not too sure what went wrong.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/katholt/srst2/issues/127#issuecomment-654547107, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAGLFHSY4IJZFR2PZDHURX3R2J3SVANCNFSM4OLWXKZQ

.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/katholt/srst2/issues/127#issuecomment-654661283, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH3MKYUCQPRR32I2LNSHOQLR2LGOVANCNFSM4OLWXKZQ .