Open pishoyg opened 3 months ago
Promoting to p3
since this is something that we want to do, it's not exactly in the backlog.
Note: Regarding output/
, only KELLIA remains. The rest has outputs in one directory per format.
TODO: Define data/raw/
and data/input/
directories for the Bible. This is needed for #131.
Remaining TODO's:
data/input/
in copticsite.data/input/
and data/raw
in KELLIA.data/input/
and data/raw
in Crum.find . -type d -name data -not -path './archive/*'
./bible/stshenouda.org/data
./grammar/data
./flashcards/data
./dictionary/copticocc.org/data
./dictionary/marcion.sourceforge.net/data
./dictionary/kellia.uni-goettingen.de/data
./dictionary/copticsite.com/data
./site/data
./keyboard/data
ls -A
.DS_Store .git README.md coptic.egg-info grammar setup.py test
.csslintrc .gitignore __pycache__ dictionary keyboard site utils.py
.env .pre-commit-config.yaml archive eslint.config.js morphology stats.sh
.env_INFO Makefile bible flashcards requirements.txt stats.tsv
ls -d */
__pycache__/ archive/ bible/ coptic.egg-info/ dictionary/ flashcards/ grammar/ keyboard/ morphology/ site/ test/
TODO: Enforce no code under data/
, and no data outside of data/
. You could do this using file prefixes. No *.py
under data/
, and no *.tsv
unless under it!
data/raw/
is raw data.data/input/
is data that we have modified or added.data/output/
is the data produced by our pipeline. Each output format lives in a subdirectory of this directory.