+TITLE: Add Language Tags for Monier Williams
This repository documents my effort adding language tags to the words in Arabic script occurring in the Monier Williams dictionary for the Sanskrit library project.
Index of files:
- Data
- [[file:monier_with_tags.xml][monier_with_tags.xml]]: The final output of the process: the Monier Williams dictionary with added tags for Arabic script and language of origin.
- [[file:monier_lines_with_tags.xml][monier_lines_with_tags.xml]]: The line numbers in which the lines originally occur in ~monier.xml~ and the updated line with language tags
- [[file:monier.xml][monier.xml]]: The original dictionary file
- [[file:monier.dtd][monier.dtd]]: Document specifications for the correct XML formatting
- [[file:cases.org][cases.org]]: Notes on why each etymology was chosen and other observations on the data
- [[file:errors.org][errors.org]]: Notes on errors in validation / linting of [[file:monier_with_tags.xml][monier_with_tags.xml]]
- Code
- [[file:extract_arabic.py][extract_arabic.py]]: Extract the lines with Arabic text
- [[file:integrate.py][integrate.py]]: Generate ~monier_lines_with_tags.xml~
- [[file:combine.py][combine.py]]: Integrate the modified lines (now containing etymologies) into the original dictionary, generate ~monier_with_tags.xml~