Closed toddknutson closed 8 years ago
Todd,
Thanks for spotting that one. The VFDB changed their FASTA file format, and my new parsing approach was grabbing the wrong allele name for that one. The allelle name shouldn't be map_ATCC 25904
but rather map_VF0016
.
I've fixed VFDB_cdhit_to_csv.py
again to deal with all gene/allele names properly (I hope!), so if you pull from the SRST2 master branch, it should work properly now! There's no need to rerun your analysis (your space-removal trick is fine), but be aware that the allele name will change if you generate the SRST2 database again.
Ryan
Hi Ryan,
Great, thanks for the update. And thanks for SRST2, it has allowed us to make a very interesting discovery that would not have been possible without your software!
Todd
On Jul 20, 2016, at 12:33 AM, Ryan Wick notifications@github.com wrote:
Todd,
Thanks for spotting that one. The VFDB changed their FASTA file format, and my new parsing approach was grabbing the wrong allele name for that one. The allelle name shouldn't be map_ATCC 25904but rather map_VF0016.
I've fixed VFDB_cdhit_to_csv.py again to deal with all gene/allele names properly (I hope!), so if you pull from the SRST2 master branch, it should work properly now! There's no need to rerun your analysis (your space-removal trick is fine), but be aware that the allele name will change if you generate the SRST2 database again.
Ryan
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/katholt/srst2/issues/67#issuecomment-233842154, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAaPoNYen2ZgcWdl7fL3fl8NJIYVgUpks5qXbMRgaJpZM4JM250.
Hi,
I cloned your github code on 6/20/16, which contained the latest fixes for creating SRST2 compatible files using the updated VFDB downloads. However, when following the directions on your
database_clustering/readme.md
page, my finalVF_clustered.fasta
file contained an incorrectly formatted[culster]__[gene]__[allele]__[VFDB]
name. If this file is used with SRST2, the program breaks with a python dictionary error:KeyError: '51__map__map_ATCC'
, indicating this key does not exist.I found that my final
VF_clustered.fasta
file contained a sequence with header:However, this is formatted wrong, and there should not be a space between ATCC and 25904. To fix the problem, I manually deleted the space and re-ran SRST2. This corrected the problem. However, I think one of the scripts used to parse the headers made an error in this case. I have not investigated which script or the code to find the bug. But I wanted to let you know. Thanks for a great tool!! The full error traceback is below: