tseemann / mlst

:id: Scan contig files against PubMLST typing schemes
GNU General Public License v2.0
201 stars 47 forks source link

Brucella MLST 21 #66

Closed keyburn closed 6 years ago

keyburn commented 6 years ago

Hi Torsten,

I have tried to add the brucella 21 MLST to your script using the instructions provided in the readme document. I downloaded the 21 locus txt file from pubMLST and the associated sequence file in .tfa format. I run the mlst-make_blast_db and it successfully adds the sequences to the mlst.fa file. However when I check it is in the list (mlst --longlist | grep brucella_2) it does not list and when I try to run with the brucella_2 scheme it comes up as invalid --scheme. Do you know what else I need to do?

Cheers

tseemann commented 6 years ago

Does it look like this? ie. brucella_2.txt

ls -l mlst/db/pubmlst/brucella_2/
total 148
-rw-r--r--. 1 tseemann domain^users 15759 May 11 15:04 aroA.tfa
-rw-r--r--. 1 tseemann domain^users  1844 May 11 15:01 brucella_2.txt
-rw-r--r--. 1 tseemann domain^users 11406 May 11 15:04 cobQ.tfa
-rw-r--r--. 1 tseemann domain^users 11192 May 11 15:04 dnaK.tfa
-rw-r--r--. 1 tseemann domain^users 17594 May 11 15:04 gap.tfa
-rw-r--r--. 1 tseemann domain^users 21111 May 11 15:04 glk.tfa
-rw-r--r--. 1 tseemann domain^users 12141 May 11 15:04 gyrB.tfa
-rw-r--r--. 1 tseemann domain^users 11233 May 11 15:04 int_hyp.tfa
-rw-r--r--. 1 tseemann domain^users 18824 May 11 15:04 omp25.tfa
-rw-r--r--. 1 tseemann domain^users 13095 May 11 15:04 trpE.tfa

Is the first line of the .txt file like this?

head -n 1 mlst/db/brucella/brucella_2.txt
ST      gap     aroA    glk     dnaK    gyrB    trpE    cobQ    int_hyp omp25

What does this say?

blastdbcmd  -db mlst/db/blast/mlst.fa -entry all | grep -m 1 brucella_2

DId you run mlst-make_blast_db FROM the scripts/ folder? (you must)

keyburn commented 6 years ago

this is what it looks like: total 292 -rw-r--r-- 1 key026 domain users 11167 Aug 3 16:43 acnA.tfa -rw-r--r-- 1 key026 domain users 15759 Aug 3 16:43 aroA.tfa -rw-r--r-- 1 key026 domain users 6676 Aug 3 16:43 brucella_2.txt -rw-r--r-- 1 key026 domain users 7913 Aug 3 16:43 caiA.tfa -rw-r--r-- 1 key026 domain users 11406 Aug 3 16:43 cobQ.tfa -rw-r--r-- 1 key026 domain users 11101 Aug 3 16:43 csdB.tfa -rw-r--r-- 1 key026 domain users 14207 Aug 3 16:43 ddlA.tfa -rw-r--r-- 1 key026 domain users 11192 Aug 3 16:43 dnaK.tfa -rw-r--r-- 1 key026 domain users 7591 Aug 3 16:43 fbaA.tfa -rw-r--r-- 1 key026 domain users 8433 Aug 3 16:43 fumC.tfa -rw-r--r-- 1 key026 domain users 18201 Aug 3 16:43 gap.tfa -rw-r--r-- 1 key026 domain users 21111 Aug 3 16:43 glk.tfa -rw-r--r-- 1 key026 domain users 12141 Aug 3 16:43 gyrB.tfa -rw-r--r-- 1 key026 domain users 11233 Aug 3 16:43 int_hyp.tfa -rw-r--r-- 1 key026 domain users 9491 Aug 3 16:43 leuA.tfa -rw-r--r-- 1 key026 domain users 8511 Aug 3 16:43 mutL.tfa -rw-r--r-- 1 key026 domain users 7413 Aug 3 16:43 mviM.tfa -rw-r--r-- 1 key026 domain users 18824 Aug 3 16:43 omp25.tfa -rw-r--r-- 1 key026 domain users 9692 Aug 3 16:43 prpE.tfa -rw-r--r-- 1 key026 domain users 12526 Aug 3 16:43 putA.tfa -rw-r--r-- 1 key026 domain users 10071 Aug 3 16:43 soxA.tfa -rw-r--r-- 1 key026 domain users 13095 Aug 3 16:43 trpE.tfa

head -n 1 ~/mlst/db/pubmlst/brucella_2/brucella_2.txt ST gap aroA glk dnaK gyrB trpE cobQ int_hyp omp25 prpE caiA csdB soxA leuA mviM fumC fbaA ddlA putA mutL acnA

tseemann commented 6 years ago

What does this say?

blastdbcmd  -db mlst/db/blast/mlst.fa -entry all | grep -m 1 brucella_2

DId you run mlst-make_blast_db FROM the scripts/ folder? (you must)

Are all the rows using TAB characters between the alleles? (spaces are not allowed)

keyburn commented 6 years ago

key026@aahl-10-mel:~$ blastdbcmd -db mlst/db/blast/mlst.fa -entry all | grep -m 1 brucella_2

brucella_2.acnA_1

not sure about from script folder. I will try now.

Cheers

keyburn commented 6 years ago

I just ran in the script folder.

Still not working.

tseemann commented 6 years ago

Ok, the aahl computer name confirms you are Anthony :)

It seems the alleles got into the blast database. --longlist parses the pubmlst folder, not sure what is going wrong.

Did you copy these files originally from a Mac or a WIndows computer? WHat are you on now? The line endings might be wrong.

keyburn commented 6 years ago

I downloaded on windows. Renamed original .fas to .tfa using total commander. I;ll have a look at the line endings.

keyburn commented 6 years ago

I processed all the files through dos2unix.

Still not working

tseemann commented 6 years ago

Can you zip all the files and email them to me? I'll check it out.

ALso, how did you install mlst ? version?

keyburn commented 6 years ago

Yep.

I installed using conda.

Just one stupid question when I type in where mlst it is pointing to anaconda location not to your mlst location. is this correct?

keyburn commented 6 years ago

version 2.10.

I have sent the files

keyburn commented 6 years ago

Here are the files.

Thank you

Torsten

From: Torsten Seemann [mailto:notifications@github.com] Sent: Monday, August 6, 2018 3:05 PM To: tseemann/mlst mlst@noreply.github.com Cc: Keyburn, Anthony (AAHL, Geelong AAHL) Anthony.Keyburn@csiro.au; Author author@noreply.github.com Subject: Re: [tseemann/mlst] Brucella MLST 21 (#66)

Can you zip all the files and email them to me? I'll check it out.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/tseemann/mlst/issues/66#issuecomment-410589887, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AeN28_3exC38sz76fttZmCF5HwlL9_NEks5uN86HgaJpZM4VvpFn.

tseemann commented 6 years ago

@keyburn i don't see anything attached - you might need to email to my normal email account at unimelb edu au.

HBrendy commented 6 years ago

Dear Torsten,

how would you cope with a bad MLST schemes? In both Brucella schemes, 9 loci & 21 loci, mlst finds two matches with alleles 1 and 23 of the gyrB locus. mlst is aware of that and shows a warning:

[20:10:27] Found exact allele match brucella.gyrB-23 [20:10:27] WARNING: found addtional exact allele match brucella.gyrB-1

In this situation, mlst never comes to a ST conclusion. BTW, gyrB is a fixed length locus and the culprit gyrB-23 is shifted downstream by 18 nt. The non-overhanging sequence has no mutations compared to gyrB-1.

Cheers, Chris

HBrendy commented 6 years ago

Mmhhh, thinking about this, maybe i just delete gyrB-23 from the gyrB.tfa for now since it is not used in the ST profile anyways.

tseemann commented 6 years ago

@HBrendy i think that is the simplest solution. If i find dupe alleles, I refuse to call an MLST.