scrollmapper / bible_databases

Bible versions and cross-reference databases.
977 stars 346 forks source link

Source of truth #61

Closed Jmainguy closed 2 years ago

Jmainguy commented 3 years ago

as a developer, I must assume the source is true when working with it. However, (and not a sleight against scrollmapper, this is a great project and providing good value) there are errors in all the sources contained in this repo. (typos that is, Gods word is infallible).

I think we need a single source of truth, and then to add code for outputting it into the different formats, mysql, sqlite, md, txt, then as errors are found, we can update that conversion code. I personally believe https://crosswire.org/sword/modules/ModDisp.jsp?modType=Bibles has the best online sources of translations, and should be used as the source that the other formats in this repo are created from, however, I would love to see other peoples opinions.

The schema provided by this repo is fantastic, I believe we just need to work together to get the translations as consistent as we can.

Thank you for providing this repo and community.

leojonathanoh commented 3 years ago

Definitely agree there should be a source of truth, which might or might not be committed in this repo. If it is indeed committed in this repo it should be a portable ubiquitous format, e.g. json.

ive tried looking at the git history, and the first few commits were generated by the committer who didn't include the source of truth. Or perhaps, if there is a source of truth, i can't seem to locate in this repo.

If we do establish that source of truth, then we would be able to fix the typos and formatting issues e.g. #47 #52 #55 #57 #58 #59

leojonathanoh commented 3 years ago

EDIT: From my investigation, the very first commit with a db is in https://github.com/scrollmapper/bible_databases/commit/c1a958124ca75798961b6bc0807a382abab6cc42, where a .sql, .csv, .json, .xml were committed. The fact that there is no presence of double-quotes at all in the .sql, .csv, .json, .xml and all subsequently committed DBs' bible text suggest that the author of those DBs might have replaced double-quotes " with ` and ' as a handy way to escape ", and the format(s) where " needs to be escaped is .csv, .json, .xml (at least in the original commit). This also explains the nesting of in-addition-to expressions in speech expressions, see: https://github.com/scrollmapper/bible_databases/issues/58#issuecomment-855559820

If the above process was indeed how the original .sql, .csv, .json, .xml DBs and all subsequent DB formats were generated, we would have to consider all DBs bible text as mutated, and would be a need to get a hand of the actual source of truth.

scrollmapper commented 2 years ago

Hello, I would suggest obtaining text to any of the translations in question, then doing a verse by verse test programmatically, and flagging differences. @leojonathanoh this repo was made a very long time ago, but I don't recall switching quotes from double to single, etc. In fact, most of it was done with python scripts that would auto-escape inserts. But still, if you find errors, please point them out. Very important.

leojonathanoh commented 2 years ago

@scrollmapper do you know any links for me to get the raw unformatted text for the five versions? I'm really keen on getting those for reading and study etc.