pishoyg / coptic

This is a project that aims to make the Coptic language more learnable.
https://remnqymi.com/
GNU General Public License v3.0
10 stars 0 forks source link

[`data/`] Apply the Data Directory Conventions #123

Open pishoyg opened 3 months ago

pishoyg commented 3 months ago

data/raw/ is raw data.

data/input/ is data that we have modified or added.

data/output/ is the data produced by our pipeline. Each output format lives in a subdirectory of this directory.

pishoyg commented 3 months ago

Promoting to p3 since this is something that we want to do, it's not exactly in the backlog.

Note: Regarding output/, only KELLIA remains. The rest has outputs in one directory per format.

pishoyg commented 3 months ago

TODO: Define data/raw/ and data/input/ directories for the Bible. This is needed for #131.

pishoyg commented 3 months ago

Remaining TODO's:

pishoyg commented 3 months ago
find . -type d -name data -not -path './archive/*'
./bible/stshenouda.org/data
./grammar/data
./flashcards/data
./dictionary/copticocc.org/data
./dictionary/marcion.sourceforge.net/data
./dictionary/kellia.uni-goettingen.de/data
./dictionary/copticsite.com/data
./site/data
./keyboard/data
ls -A
.DS_Store   .git                     README.md    coptic.egg-info   grammar           setup.py   test
.csslintrc  .gitignore               __pycache__  dictionary        keyboard          site       utils.py
.env        .pre-commit-config.yaml  archive      eslint.config.js  morphology        stats.sh
.env_INFO   Makefile                 bible        flashcards        requirements.txt  stats.tsv
pishoyg commented 3 months ago
ls -d */
__pycache__/  archive/  bible/  coptic.egg-info/  dictionary/  flashcards/  grammar/  keyboard/  morphology/  site/  test/
pishoyg commented 3 months ago

TODO: Enforce no code under data/, and no data outside of data/. You could do this using file prefixes. No *.py under data/, and no *.tsv unless under it!