sanger-pathogens / ariba

Antimicrobial Resistance Identification By Assembly
http://sanger-pathogens.github.io/ariba/
Other
167 stars 53 forks source link

Enhancements in vfdb_parser.py for VFDB full dataset support #320

Open lknegendorf opened 2 years ago

lknegendorf commented 2 years ago

Currently, when using the getref vfbd_full (...) command downloading the full VFDB dataset, it is not possible to proceed with ariba preparef (...) using the resulting reference data without manual changes to both the .fa and the .tsv files. This is, because the reference data set contains several pitfalls not adressed yet:

The modifications proposed here adress all shortcomings mentioned above. Furthermore, the xls-derived metadata from VFDB explaining function and mechanism of a respective virulence gene (VFs.xls.gz, see VFs description file on VFDB download page) are included into the metadata.tsv derived from vfdb_parser to allow a more comprehensive view of the ariba variant calling results for working with VFDB.

Thank you for considering to merge for a future release.