openscriptures / HebrewLexicon

BDB outline with links to Strong's and more
109 stars 48 forks source link

Complete text of BDB? #3

Open biblicalhumanities opened 10 years ago

biblicalhumanities commented 10 years ago

Nice work! I would like to see the complete text of BDB, including the introduction. Is that something you would consider? Does the answer depend on who does the work?

DavidTroidl commented 10 years ago

We have the front matter. It just hasn't made it into the release. The full text is pretty much beyond the capability of one individual, so it does depend on who does the work. I had a quirky PHP app for editing the lexicon, that Daniel and I used to get it into its present form.
I've made some progress in updating it to the current format, and making it somewhat more stable. Then so many other things came along, it remains in limbo.

On 12/3/2013 10:22 AM, biblicalhumanities wrote:

Nice work! I would like to see the complete text of BDB, including the introduction. Is that something you would consider? Does the answer depend on who does the work?

— Reply to this email directly or view it on GitHub https://github.com/openscriptures/HebrewLexicon/issues/3.

rrshaban commented 9 years ago

Hi @DavidTroidl,

How much of the text BDB is currently posted in BrownDriverBriggs.xml? Is there a rough estimate of how much remains a work in progress? How can people help with getting it completed?

thank you, Razi

DavidTroidl commented 9 years ago

Hi,

Brown, Driver, Briggs is a huge work. We have all the entries represented. Some of the shorter ones are complete. Most of the others have the "most significant" information included. We don't really have a user-friendly method of contributing, but anybody who wants to extend the work is free to do so. It's really hard to say how much we have completed. A very uneducated guess would be maybe 35%?

Peace,

David

On 2/26/2015 4:11 PM, Razi Shaban wrote:

Hi @DavidTroidl https://github.com/DavidTroidl,

How much of the text BDB is currently posted in BrownDriverBriggs.xml? Is there a rough estimate of how much remains a work in progress? How can people help with getting it completed?

thank you, Razi

— Reply to this email directly or view it on GitHub https://github.com/openscriptures/HebrewLexicon/issues/3#issuecomment-76274290.


This email has been checked for viruses by Avast antivirus software. http://www.avast.com

rrshaban commented 9 years ago

Have you given any thought to scraping a website that has the BDB posted? e.g. http://biblehub.com/hebrew/776.htm

I'm not sure how the terms of use for the BDB are, but as the BDB is in the public domain, I don't see a reason why scraping the digital version there might not be allowed. The attribution given there is as follows:

"Brown-Driver-Briggs Hebrew and English Lexicon, Unabridged, Electronic Database. Copyright © 2002, 2003, 2006 by Biblesoft, Inc. All rights reserved. Used by permission. BibleSoft.com"

dowens76 commented 9 years ago

Judging by a quick look at that entry, their database is abridged. I would think that what we have already at least has as much as that one and is unencumbered by their copyright assertions.

strouptl commented 8 years ago

@DavidTroidl this is a wonderful resource! I stumbled across it looking for some lexical information that I was not able to get at through the Accordance UI, and was able to export exactly what I needed using a simple XML parser. I see that "all entries are represented" from your comments above, but I was just wondering if you know for sure if all stems are present for those entries?

DavidTroidl commented 8 years ago

I just came across an entry recently that seemed to need its senses expanded. There may in fact be some verbs that don't have all their stems represented. I have just uploaded the latest revision.

On 3/7/2016 2:14 AM, Laney Stroup wrote:

@DavidTroidl https://github.com/DavidTroidl this is a wonderful resource! I stumbled across it looking for some lexical information that I was not able to get at through the Accordance UI, and was able to export exactly what I needed using a simple XML parser. I see that "all entries are represented" from your comments above, but I was just wondering if you know for sure if all stems are present for those entries?

— Reply to this email directly or view it on GitHub https://github.com/openscriptures/HebrewLexicon/issues/3#issuecomment-193133720.


This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

EliezerIsrael commented 7 years ago

http://www.ericlevy.com/Revel/BDB/BDB/main.htm

This version of the BDB appears to be complete, although I have seen a few minor errors - numbering of senses being off, in particular. It looks to be parseable, with some effort.

dowens76 commented 7 years ago

Wow, that is an impressive piece of work, thanks for the link. I wonder if he would make his source files available.

EliezerIsrael commented 7 years ago

From the looks of it, R. Eric Levy copied it from biblecentre.net, which is no longer online. I reached out to R. Levy, but haven't yet heard back. It's relatively easy to download the entire html of the website. Then it's just a small matter of parsing. :)

The base text is in the public domain, but some of the emendations here make me wonder if this was digitized from a newer version that someone may try to assert rights over. In any case, the core material is squarely in the public domain, and no one could protest if the core work of the BDB were parsed and redistributed from here.

DavidTroidl commented 7 years ago

I had the BDB from BibleCentre.net years ago, and had done some significant work with it. Then I deleted everything I had, due to this post https://blogs.thegospelcoalition.org/justintaylor/2008/06/24/biblecentrenet-intellectual-property/. There was no provenance of the data, and it appeared suspect. Certainly BDB is in the public domain, but someone put extensive work into making those files, and I personally would not use them without permission.

On 12/22/2016 2:52 AM, Lev Eliezer Israel wrote:

From the looks of it, R. Eric Levy copied it from biblecentre.net, which is no longer online. I reached out to R. Levy, but haven't yet heard back. It's relatively easy to download the entire html of the website. Then it's just a small matter of parsing. :)

The base text is in the public domain, but some of the emendations here make me wonder if this was digitized from a newer version that someone may try to assert rights over. In any case, the core material is squarely in the public domain, and no one could protest if the core work of the BDB were parsed and redistributed from here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openscriptures/HebrewLexicon/issues/3#issuecomment-268739997, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKwBTQSl7KbtUZhKRX_tqg1Ihqmybozks5rKiwqgaJpZM4BRd9a.


This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus

dowens76 commented 7 years ago
Thanks, David. 

On 12/22/16 8:09 PM, David Troidl
  wrote:

I had the BDB from BibleCentre.net years ago, and had
  done some 
  significant work with it. Then I deleted everything I had, due to
  this 
  post 
  https://blogs.thegospelcoalition.org/justintaylor/2008/06/24/biblecentrenet-intellectual-property/.

  There was no provenance of the data, and it appeared suspect.
  Certainly 
  BDB is in the public domain, but someone put extensive work into
  making 
  those files, and I personally would not use them without
  permission.

  On 12/22/2016 2:52 AM, Lev Eliezer Israel wrote:
  >
  > From the looks of it, R. Eric Levy copied it from
  biblecentre.net, 
  > which is no longer online. I reached out to R. Levy, but
  haven't yet 
  > heard back. It's relatively easy to download the entire html
  of the 
  > website. Then it's just a small matter of parsing. :)
  >
  > The base text is in the public domain, but some of the
  emendations 
  > here make me wonder if this was digitized from a newer
  version that 
  > someone may try to assert rights over. In any case, the core
  material 
  > is squarely in the public domain, and no one could protest if
  the core 
  > work of the BDB were parsed and redistributed from here.
  >
  > —
  > You are receiving this because you were mentioned.
  > Reply to this email directly, view it on GitHub 
  >
  <https://github.com/openscriptures/HebrewLexicon/issues/3#issuecomment-268739997>,

  > or mute the thread 
  >

https://github.com/notifications/unsubscribe-auth/AAKwBTQSl7KbtUZhKRX_tqg1Ihqmybozks5rKiwqgaJpZM4BRd9a.

  ---
  This email has been checked for viruses by Avast antivirus
  software.
  https://www.avast.com/antivirus
  —
    You are receiving this because you commented.
    Reply to this email directly, view
      it on GitHub, or mute
      the thread.

  {"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/openscriptures/HebrewLexicon","title":"openscriptures/HebrewLexicon","subtitle":"GitHub repository","main_image_url":<a class="moz-txt-link-rfc2396E" href="https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png">"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png"</a>,"avatar_image_url":<a class="moz-txt-link-rfc2396E" href="https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png">"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png"</a>,"action":{"name":"Open in GitHub","url":<a class="moz-txt-link-rfc2396E" href="https://github.com/openscriptures/HebrewLexicon">"https://github.com/openscriptures/HebrewLexicon"</

a>}},"updates":{"snippets":[{"icon":"PERSON","message":"@DavidTroidl in #3: I had the BDB from BibleCentre.net years ago, and had done some \nsignificant work with it. Then I deleted everything I had, due to this \npost \nhttps://blogs.thegospelcoalition.org/justintaylor/2008/06/24/biblecentrenet-intellectual-property/. \nThere was no provenance of the data, and it appeared suspect. Certainly \nBDB is in the public domain, but someone put extensive work into making \nthose files, and I personally would not use them without permission.\n\n\nOn 12/22/2016 2:52 AM, Lev Eliezer Israel wrote:\n\u003e\n\u003e From the looks of it, R. Eric Levy copied it from biblecentre.net, \n\u003e which is no longer online. I reached out to R. Levy, but haven't yet \n\u003e heard back. It's relatively easy to download the entire html of the \n\u003e website. Then it's just a small matter of parsing. :)\n\u003e\n\u003e The base text is in the public domain, but some of the emendations \n\u003e here make me wonder if this was digitized from a newer version that \n\u003e someone may try to assert rights over. In any case, the core material \n\u003e is squarely in the public domain, and no one could protest if the core \n\u003e work of the BDB were parsed and redistributed from here.\n\u003e\n\u003e —\n\u003e You are receiving this because you were mentioned.\n\u003e Reply to this email directly, view it on GitHub \n\u003e \u003chttps://github.com/openscriptures/HebrewLexicon/issues/3#issuecomment-268739997\u003e, \n\u003e or mute the thread \n\u003e \u003chttps://github.com/notifications/unsubscribe-auth/AAKwBTQSl7KbtUZhKRX_tqg1Ihqmybozks5rKiwqgaJpZM4BRd9a\u003e.\n\u003e\n\n\n\n---\nThis email has been checked for viruses by Avast antivirus software.\nhttps://www.avast.com/antivirus\n"}],"action":{"name":"View Issue","url":"https://github.com/openscriptures/HebrewLe xicon/issues/3#issuecomment-268796246"}}}

EliezerIsrael commented 7 years ago

Ah, well that is disappointing. I'm not terribly surprised, though.

Do we have any idea who the proper originator of the BDB data is? I'd love to have a conversation with them. Perhaps there's a way we can get it released into the commons legitimately.

EliezerIsrael commented 7 years ago

Likely from Logos. https://www.logos.com/product/1796/enhanced-brown-driver-briggs-hebrew-and-english-lexicon

dowens76 commented 7 years ago

Oooh, best not to mess with that.

EliezerIsrael commented 6 years ago

Here's a gift! https://github.com/jackweinbender/bdb_parse

https://liberalarts.utexas.edu/mes/news/article.php?id=6768 A team at UTexas Austin got a NEH grant to create an online Lexicon based on the BDB. The grant wasn't renewed, but they got as far as digitizing the public domain DBD printing. I swapped emails with them, and their view is that since public money paid for the work, the resulting data is public property. They gave their blessing to carry the project forward in whatever ways we can.

It's a bit rough, the data - it needs to be converted from its current form into proper unicode. There's some node/js code that does some setup, but doesn't go so far as parsing the data.

Even so - this seems like a great bounty of data.

DavidTroidl commented 6 years ago

The key map for Bwhebb is at Bible Works Fonts. This should help in constructing a search and replace script for the Hebrew. The consonants appear in reverse order, but each is followed by its vowel: bybia' means אָבִיב

dajare commented 6 years ago

There is a macro for Word 2003 that converts BibleWorks fonts to unicode. It's in the "OLE and DDE" section of the help file (towards the end: section 58 in BWks 9). It includes this guidance:

To implement them just copy the blue text below into the Word Macro editor. If you want to use a different Unicode font you will need to edit the font names in the calling routines below. In other words, change "Ezra SIL" and "Arial Unicode MS" to the names of the fonts you want to use. BibleWorks ships with "SBL Greek" and "SBL Hebrew", as well as "Ezra SIL".

I have put the macro itself in a Gist, if that helps. But anyone with BibleWorks (for many versions back) will have this already.

jackweinbender commented 6 years ago

All,

A few things about this data.

  1. I wrote a crosswalk and converter for the legacy > Unicode conversion.

  2. There is one major issue with the Hebrew, namely, that all non-final Tsades without dagesh, for some reason, has been encoded as a het. I.e there’s not a straight forward way of knowing whether any particular “het” should actually be a tsade. You may be able to infer them based on their position in the Lexicon (all the words that start with het, obviously, are together; root aleph-het would show up before aleph-tsade, if it even exists [in which case a dictionary of BH roots could Ben helpful]).

jackweinbender commented 6 years ago

Here’s the transcoder (it was private, sry). https://github.com/jackweinbender/bdb_transcoder I wrote it in Elixir, for a reason I don’t recall. I’ve stopped working on BDB stuff for the present while I finish my dissertation.

EliezerIsrael commented 6 years ago

@jackweinbender This is great. Thank you. I'd been working on a transcoder independently, over here - https://github.com/Sefaria/bdb_parse Still have some dangling issues - could be that your work will help.

jackweinbender commented 6 years ago

FWIW; the JSON file in the transcoder should be exhaustive.

Is there a plan to encode this as a TEI document? I’ve also got a simple digital site to display the BDB by page like (http://jastrow.semitics-archive.org), if I can find it. I’ve been playing with some computer vision stuff to split up the images into entries/paragraphs that might make transcription (or perhaps corrected OCR?) easier.

jackweinbender commented 6 years ago

I’m going to try to keep up with these projects; I’d like to help. I was very disappointed when our NEH grant was not renewed. The BDB is such a fantastic work of scholarship, it is tragic that there isn’t a complete, open, digital edition f it yet.

dajare commented 6 years ago

@jackweinbender said:

I’ve also got a simple digital site to display the BDB by page like (http://jastrow.semitics-archive.org), if I can find it.

I hope you can find it! That would be valuable, although something the GKC on Wikisource would be remarkable. But please ping me if you mount your digi-BDB! Thanks.

jackweinbender commented 6 years ago

I will. I’m out of town this week, but i’ll post a link whenever I get it deployed.

jackweinbender commented 6 years ago

I actually reimplemented my BDB site using the data from this repo's XML file, since the former iteration used the buggy one referenced above. Everything seems to still work, so... as promised:

http://bdb.semitics-archive.org/

It probably sucks on mobile, FWIW.