openscriptures / morphhb

Open Scriptures Hebrew Bible
https://hb.openscriptures.org
Other
186 stars 64 forks source link

Open Scriptures Hebrew Bible

The Open Scriptures Hebrew Bible (OSHB) is a project to analyze the Hebrew Bible by lemma and morphology. The project is marked up in OSIS XML and currently contains lemma attributes for most words (using an augmentation of Strong’s numbers). We are in the process of adding morphology attributes as well. These files are found in the wlc directory.

Lemma and morphology data are licensed under a Creative Commons Attribution 4.0 International license. For attribution purposes, credit the Open Scriptures Hebrew Bible Project. The text of the WLC remains in the Public Domain. See the LICENSE file for more information.

Word tag attributes

Word tags each contain three attributes:

Additional Resources

Hebrew Normalization

The SBL Hebrew User Manual has a section entitled, The normalisation issue, pp. 8 ff.

Normalisation is a process by which sequences of characters in text that can be variously encoded but are semantically identical are treated as identically encoded. (p. 8)

Because of the warnings in that manual, along with my experiences dealing with the MapM text from WikiSource, any uses of the OSHB should avoid NFC normalization.

Updated: January 27, 2017

Perl script for JSON output

There is a perl script which generates a JSON version of the morphology which is published to npm here: https://www.npmjs.com/package/morphhb

This JavaScript module is designed to be lightweight, so it is formatted as follows:

The perl script which generates this is called morphhbXML-to-JSON.pl. It has several options:

You can run this script like so:

`perl morphhbXML-to-JSON.pl --stripPointing --removeLemmaTypes --prefixLemmasWithH --remapVerses`

Python script with Docker

There is also a Python script to transform the data into a JSON file.

It has similar arguments as the perl script above, but it has the argument --splitByBook.

Use --splitByBook to create a JSON file per book.

The python script can be run in a Docker container based on the Dockerfile in the main directory. To build the docker image and run the container, use the following commands (possibly as root):

docker build . -t local/morphhb && docker run -it -v `pwd`:/var/app local/morphhb