katholt / srst2

Short Read Sequence Typing for Bacterial Pathogens
Other
123 stars 65 forks source link

Clustering CARD database #129

Closed arooney13 closed 3 years ago

arooney13 commented 3 years ago

Hi,

I'm currently working with a specific version of the CARD database (3.0.7) and would like to cluster it for use with srst2. I went to the CARD website and downloaded the fasta files (e.g. nucleotide fasta protein homolog model), then I used cd-hit as described, to cluster the sequences at 90% similarity.

I moved to step 2 to parse the files and tabulate the results. I've attached the txt file. I'm not too sure where to go from here as there looks like there some inconsistencies and I was hoping you could guide me through what I should be doing next here.

I really appreciate the help,

Ashley nucleotide_fasta_protein_homolog_rawseqs_clustered.txt

aslangabriel commented 3 years ago

Have you made it ? I found the same problem

arooney13 commented 3 years ago

Have you made it ? I found the same problem

No luck, I decided to move onto other projects while I wait for a reply.

aslangabriel commented 3 years ago

Thanks for your time. I looked up the references and found that the problem was arised from the incompatabity of python 2 and python3. Most python scripts were not ran well on pthon3 while biothon no longer supports python2. Mr katholt, would you mind updatin those python scripts used for database clustering? I''ll appreciate it very much.

arooney13 notifications@github.com 于2020年10月1日周四 下午10:23写道:

Have you made it ? I found the same problem

No luck, I decided to move onto other projects while I wait for a reply.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/katholt/srst2/issues/129#issuecomment-702170676, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALF4WUAV3ROA3WLMODXZ2YLSISGENANCNFSM4QGDVAOQ .

-- Dr. Qi Feng College of Life Sciences Nanjing Normal University 1 Wenyuan Road 210046 Nanjing P.R. China

katholt commented 3 years ago

Sorry for the delay in replying, we are not maintaining the clustering scripts as we don't have staff to do so and most people use the provided resistance DB. However we have updated the resistance DB provided to one that is based on CARD v.3.0.8 - available in data/CARD_v3.0.8_SRST2.fasta