grambank / pygrambank

Apache License 2.0
4 stars 1 forks source link

pygrambank

Curation tools for Grambank data.

Build Status PyPI

Install

pygrambank can be installed from PyPI via

pip install pygrambank

or from a clone of [grambank/pygrambank]:

git clone ...
cd pygrambank
pip install -e .

You should install pygrambank in a virtual environment to make sure it does not mess with a system-wide Python installation.

CLI

Installing pygrambank will also install a command line program grambank. Data curation functionality is implemented as subcommands of this program. To get information about available subcommands, run

grambank --help

More info on individual subcommands can be obtained running

grambank <SUBCOMMAND> -h

e.g.

$ grambank describe -h
usage: grambank describe [-h] [--columns] SHEET

Describe a (set of) sheets.

This includes checking for correctness - i.e. the functionality of `grambank check`.
While references will be parsed, the corresponding sources will **not** be looked up
in Glottolog (since this is slow). Thus, for a final check of a sheet, you must run
`grambank sourcelookup`.

positional arguments:
  SHEET       Path of a specific TSV file to check or substring of a filename
              (e.g. a glottocode)

optional arguments:
  -h, --help  show this help message and exit
  --columns   List columns of the sheet (default: False)

For ´describeand ´sourcelookup at ELDP-glottobank, it is necessary that you run the commands from the dir ELDP-glottobank, otherwise the filepaths to gb20.txt, gb.bib, contributors etc will not work.

e.g.

[2024-05-20 10:45:36] skirgard@lingn06 /Users/skirgard/Git/glottobank/ELDP-glottobank
> grambank describe grambank/original_sheets/FCE_apal1257.tsv
[2024-05-20 10:45:36] skirgard@lingn06 /Users/skirgard/Git/glottobank/ELDP-glottobank
> grambank sourcelookup ../../glottolog/glottolog grambank/original_sheets/FCE_apal1257.tsv

API

pygrambank also allows programmatic access to Grambank data from Python programs. All functionality is mediated through a pygrambank.Grambank instance:

>>> from pygrambank import Grambank
>>> gb = Grambank('.')
>>> gb.sheets_dir
PosixPath('original_sheets')
>>> for sheet in gb.iter_sheets():
...   print(sheet)
...   break
... 
original_sheets/AH_alag1248.tsv