scrollmapper / bible_databases

Bible versions and cross-reference databases.
994 stars 347 forks source link

Added MSSQL version #10

Closed gonzigonz closed 7 years ago

gonzigonz commented 7 years ago

Thanks for setting up this repository in the first place. I have a bible project and this has come in very handy. My project is running on MS SQL Server (mssql) and so I made this conversion. I think it could be a nice addition to the library.

To-date, I've tested these scripts on SQL Server 2014 & 2016. I'm sure they should work as far back as SQL Server 2008 R2.

The conversion has been migrated from the mysql version as your documentation states this to be the primary version. I performed a small reconciliation after the conversion to ensure data integrity and unfortunately I found a discrepancy with the cross_reference table. It looks to me like the original version has duplicates. I have confirmed the issue with the cross_reference-mysql.sql file and sqlite.db database only. As such I've removed the duplicates from the new mssql version only for this pull request. I wanted to confirm the issue with you first before making any other changes and also, since I am not as fluent with the other sources, I figured that perhaps someone else might have a know solution.

See below my *COUNT() comparison between the two versions: table............................................ mysql...... mssql** asv................................................ 31103..... 31103 bbe............................................... 31103..... 31103 dby............................................... 31099..... 31099 kjv................................................. 31103..... 31103 wbt............................................... 15434..... 15434 web............................................... 31102.... 31102 ylt................................................... 31103.... 31103 bible_version_key.................... 7.............. 7 key_abbreviations_english... 409......... 409 key_english................................ 66............ 66 key_genre_english.................. 8............... 8 cross_reference........................ 687218.. 343609

As you can see 343,609 is exactly halve of the records from the original source and also the exact count of distinct records from either source. I tested this using the following query on both databases:
select count(*) from (select distinct vid, sv from cross_reference) a;

PaulBrownMagic commented 7 years ago

The source file for cross_references is in this repository, the following bash command gives the number of unique lines/ records: sort cross_references.txt | uniq | wc -l This was also 343,609 + 1 for the heading line. It does appear as through the SQL database has duplicates. The SQLite database also only includes 343,609 rows in this table. Seems like you're correct about the duplicates.

gonzigonz commented 7 years ago

Thanks for the verification mate, I'll raise an issue #11 about the duplicates. If you know of a quick method to correct the cross_reference-mysql.sql script do tell.