meglab-metagenomics / amrplusplus_v2

MEGARes and AmrPlusPlus - A comprehensive database of antimicrobial resistance genes and user-friendly pipeline for analysis of high-throughput sequencing data
http://megares.meglab.org/
MIT License
25 stars 15 forks source link

Inconsistencies between output files #15

Open ZoeHansen opened 3 years ago

ZoeHansen commented 3 years ago

Hello! Thank you very much for curating AmrPlusPlus and for managing this GitHub so closely -- your documentation and your responses to the existing issues have helped me a lot! I have been able to get AmrPlusPlus to successfully run on my samples, which is great. However, as I was compiling the output data for my analysis, I noticed that there are a couple of discrepancies between the different output files (group.tsv, type.tsv, class.tsv, in particular).

For example, the group.tsv for one of my samples shows this: (This is just a header)

Sample  Group   Hits
ER0331.amr.alignment.dedup  Aminocoumarins,Aminocoumarin-resistant DNA topoisomerases,PARE  723
ER0331.amr.alignment.dedup  Aminoglycosides,16S rRNA methyltransferases,RMTF    204
ER0331.amr.alignment.dedup  Aminoglycosides,Aminoglycoside N-acetyltransferases,AAC3    21

I noticed that the "Group" is actually a comma-separated list of the Type, Class, and Group. This was easy enough to address, but when I consolidated the read counts for all "Aminocoumarins" or "Aminoglycosides" and their respective classes in this example, those totals do not match the numbers output in the type.tsv and the class.tsv for this sample.

I was wondering if this issue has been raised by anyone else and, if so, if this is normal. Which values would you recommend using in this case (the class and type values from the group.tsv which have been consolidated, or those from the separate type.tsv and class.tsv files?

Relatedly, I noticed that my mechanism.tsv files are formatted oddly:

Sample  Mechanism   Hits
ER0307.amr.alignment.dedup  ACRD
Sul|sul1_16_EF667294|Sulfonamides|Sulfonamide-resistant_dihydropteroate_synthases|SULI  56
ER0307.amr.alignment.dedup  AMPH
MLS|VatD|L12033|162-791|630|MLS|Streptogramin_A_O-acetyltransferase|VATD    12
ER0307.amr.alignment.dedup  ANT6
gi|315456453|emb|FN645444.1|betalactams|Class_C_betalactamases|CMY  40

The sample name and group are listed on one line while the entire gene and read count value are on the second -- is this how the mechanism.tsv is supposed to be written? I noticed this issue when I was trying to merge my sample files into a comprehensive sheet and was met with some grumpy error code.

I'm not sure if these discrepancies are something to worry about or if this is an indication that my install/run of AmrPlusPlus was faulty/corrupted somehow, but I thought I would reach out to see if you had any advice for moving forward. Thank you very much for your time!