sanskrit-lexicon / csl-websanlexicon

0 stars 1 forks source link

check_inventory #1

Open funderburkjim opened 4 years ago

funderburkjim commented 4 years ago

This issue for discussion of the check_inventory subfolder. Please read the introductory readme by clicking on the link.

funderburkjim commented 4 years ago

DIfference summary

webinventory.txt identifies the differences between the files managed by csl-websanlexicon (v00/inventory.txt) and the actual files in the web directory for each dictionary.

Here is a summary of the web files that are unmanaged by csl-websanlexicon, organized by folder.

gasyoun commented 4 years ago

Also deletable folder 'old'

Guess most of them are. They create more mess than solve as a backup solution.

funderburkjim commented 4 years ago

Revised webinventory

After various file deletions, etc., the comparison is simpler between

In addition to the files from csl-websanlexicon,  
the web directory for each dictionary has several additional required files.

Each of these files is 'dictionary specific' (i.e, NOT managed by csl-websanlexicon).

There are 5 files which are present for each dictionary.

web/<dict>header.xml
web/sqlite/<dict>.sqlite
web/index.php
web/webtc/pdffiles.txt
web/webtc2/query_dump.txt

The Cologne server also has a web/pdfpages/ directory of scanned images. As mentioned elsewhere,
servepdf now handles this more flexibly. 

There are 7 dictionaries with additional files:
ben: web/images/vatsa.png
cae: web/sqlite/caeab.sqlite
pw : web/sqlite/pwab.sqlite , web/sqlite/pwbib.sqlite
bur: web/sqlite/burab.sqlite
stc: web/index_fr.php, web/webtc/download_fr.html, web/webtc/help_fr.html , web/webtc/indexcaller_fr.php
pwg: web/sqlite/pwgab.sqlite , web/sqlite/pwgbib.sqlite
mw : web/sqlite/mwab.sqlite , web/sqlite/mwauthtooltips.sqlite , web/sqlite/mwkeys.sqlite

Implication

In order to recreate the web directory for a given dictionary X, we currently require:

funderburkjim commented 4 years ago

Management of non csl-websanlexicon files

The current way the additional files are managed is various, and needs revision. Before revising, here is how things are done now.

managed by pywork/redo_xml.sh

files managed manually.

These files just are. They are managed by editing. They are not managed by csl-websanlexicon. There is one of each of these files for each dictionary.

misc files managed manually

These files are peculiar to given dictionaries. They are not managed by csl-websanlexicon

Abbreviations managed manually with pywork scripts

The abbreviation databases are managed within subdirectories of pywork; the resulting sqlite files are created and copied to the web/sqlite/ directory by scripts within the pywork subdirectories. However, the invocation of these scripts is manual (i.e., done only when some change occurs).

Literary source names managed manually with pywork scripts

The literary source abbreviation databases are managed within subdirectories of pywork; the resulting sqlite files are created and copied to the web/sqlite/ directory by scripts within the pywork subdirectories. However, the invocation of these scripts is manual (i.e., done only when some change occurs).

mwkeys.sqlite managed manually with pywork script

Note: The reason for mwkeys.sqlite is obscure, but I think required -- something to do with weeding out the HxA (duplicate) headwords. I know one of the intermediate steps in constructing mwkeys.sqlite is used by sanhw1 construction (namely scans/awork/sanhw1/pywork/mwkeys/extract_keys_b.txt).

funderburkjim commented 4 years ago

Rationalizing the management of entire web directory

In light of the above fairly thorough survey, some simplifications come to mind. These will be developed in other issues. We are getting closer to a minimal viable structure for 2020 version of the dictionary subdirectories.