lmu-bioinformatics / xmlpipedb

XMLPipeDB is a suite of tools for building relational databases from XML sources with minimal manual processing of the data. While the applicability is general, our motivation was to facilitate the management of biological data from different sources.
http://xmlpipedb.cs.lmu.edu
GNU Lesser General Public License v3.0
10 stars 1 forks source link

TallyEngine errors for Vibrio profile #13

Closed kdahlquist closed 9 years ago

kdahlquist commented 9 years ago

When vetting the Vibrio export from builds 3 and 4 from issue #3 I compared the TallyEngine results with the OriginalRowCounts table and found some discrepancies:

UniProt matches OK GeneID matches OK RefSeq has 6550 in XML, Database, but 6549 in OriginalRowCounts OrderedLocusNames has 3831 in XML, Database, which would be 7662 when doubled, but the OriginalRowCounts has 7664. GO Terms XML and Database match each other in TallyEngine, but have no correct comparison group in the gdb.

I will do some sleuthing to figure out the lost IDs, but I think this will have to be later since I need to get my syllabus ready for Monday.

dondi commented 9 years ago

For RefSeq, the ID WP_001201520 (to be precise, WP_001201520.1—GenMAPP Builder drops the .n at the end) appears twice in both the XML file and the database. Likely due to a uniqueness check, this appears only once in the GDB. This explains the discrepancy of 1 in the TallyEngine.

dondi commented 9 years ago

For OrderedLocusNames, the ID VC_1738/VC_1739 appears in both the XML file and the database. GenMAPP Builder splits this into two records, VC_1738 and VC_1739. This explains the additional two records in the GDB (one with the underscore, another without).

dondi commented 9 years ago

Thus, as far as I can tell, these discrepancies are particular to the data, and not indicative of a bug in GenMAPP Builder. Please review and let me know what you think.

kdahlquist commented 9 years ago

I agree. I've been able to check those specifically and see no problems with the adjustments made by GenMAPP Builder. Can we close this?

dondi commented 9 years ago

Yep, happy to close it :)