templates and constants

funderburkjim commented 4 years ago

This comment begins my review and extension of the initial classification work for the modules of pywork.

The idea of csl-pywork is similar to that of csl-websanlexicon. The repository should contain programs and data files required to regenerate the pywork directories for the individual dictionaries. Thus, the individual pywork directories can be managed in the central location of csl-pywork.

This particular work is being done in a 'v01' directory, which currently is NOT part of the repository; it resides only in the Cologne server. When the dust settles, it is anticipated that 'v01' will be added to the repository.

The files managed by csl-pywork are classified as either:

constants -- files that are identical for each dictionary.
templates -- files that are different for each dictionary, but whose differences can easily be captured by a few parameters in a template file. When the values of the template parameters are substituted into the template, the resulting rendered template is what is needed for the particular dictionary's pywork directory.
idiosycratics -- these files are quite different for each dictionary. For example, the abbreviation expansions would be such a file (for those dictionaries which have abbreviation markup).

This comment pertains only to the constants and templates. The idiosyncratics will be discussed in other issues.

computed pywork files

An example of a computed file would be xxx.xml, the xml version of a dictionary digitization. We do not contemplate inclusion of computed files as part of this csl-pywork repository. Rather, full initialization (or updating) of an individual pywork directory will be done in two steps:

transferring files from this csl-pywork to the individual pywork directory -- a 'generate' program in csl-pywork will do this step
computing additional files within the individual pywork directory. For example, a redo_hw.sh script in the individual pywork directory will be run to compute xxxhw.txt.

funderburkjim commented 4 years ago

The v00 inventory provides a preliminary list of files classified as constants (C) or templates (T). The files mentioned in the inventory appear in the v00/makotemplates directory. The first task is to refine this classification and run tests to determine that:

the files classified as constants really are all the same in the various dictionary pywork directories
the files classified as templates are, when the templates are rendered, functionally the same as the files in the various dictionary pywork directories.

For this purpose, there is a compare subdirectory of v01. The following comments are based on work described in the compare/readme.org file.

We begin with the 8 files currently in the inventory. These files, and their initial classification as C or T: hwparse.py:T hw.py:T redo_hw.sh:T redo_xml.sh:T hw0.py:C hw2.py:C parseheadline.py:C updateByLine.py:C

funderburkjim commented 4 years ago

number of versions

A useful first analysis is to see, for each inventory file, how many versions of the appear within the individual dictionary pywork directories. This comparison is done via compare/compare_scripts.sh script, and the aggregate results in compare_scripts.txt. Here is the summary:

  1 version of pywork/hw0.py
  1 version of pywork/hw2.py
  1 version of pywork/parseheadline.py
 36 versions of pywork/redo_hw.sh
 36 versions of pywork/redo_xml.sh
 36 versions of pywork/hwparse.py
  5 version of pywork/hw.py
  5 versions of pywork/updateByLine.py

From this we can confirm: hw0.py, hw2.py, parseheadline.py are constants. hwparse.py, redo_hw.sh, and redo_xml.sh are templates hw.py and updateByLine.py are uncertain.

For the templates and uncertain files by this classification, we need further work. For those which are surely templates, we need to confirm that the template in v01/makotemplates/ correctly generates the individual pywork versions. For the uncertain ones, further comparison of the (5) versions is needed.

funderburkjim commented 4 years ago

confirm redo_hw.sh template

The compare_template_v0.py program takes a particular template (such as redo_hw.sh) and compares, for each dictionary:

the template script, i.e., the script generated by the template for that dictionary
the individual pywork version of that script

The program reports whether the two scripts are identical, and if they differ, it reports the differences.

Using the v01/makotemplates/redo_hw.sh script as the template, the generated template script IS identical to the individual pywork version for each dictionary. This confirms that the template is as needed.

Note

If one uses v00/makotemplates/redo_hw.sh as the template, then there are differences between generated script and pywork script. By comparing the v00 and v01 versions of the template, however, we see that the differences occur only in insignificant places (within 'echo' strings or within bash comments). Thus, the v00 template version is also functionally the same as the v01 template version.

funderburkjim commented 4 years ago

confirm redo_xml.sh template

Similarly, the compare_template_v0.py program validates that the v01/makotemplates/redo_xml.sh template generates all the individual dictionary versions of redo_xml.sh.

Note: For the full confirmation it was necessary:

to make minor adjustments to the mw and krm versions of redo_xml.sh
to make a substantive adjustment to the template for pw and pwg:
- This is because pw.xml and pwg.xml construction is done in two steps.
- The first step makes an intermediate pw0.xml (pwg0.xml) using make_xml.py
- In the second step, make_xml_ls.py constructs pw.xml from pw0.xml and pwauth/pwbib.txt Example: <ls>BHĀG.P.</ls> in pw0.xml becomes <ls n="1031">BHĀG.P.</ls> in pw.xml. This adjustment makes it easier for the display programs to generate tool tips. The second step for pwg is completely analogous to the step for pw.
- Currently, mw is the only other dictionary with <ls> tags.

Enhancement suggestion:

Alter the 'ls' processing for mw to be like that of pw, pwg.

gasyoun commented 4 years ago

Alter the 'ls' processing for mw to be like that of pw, pwg.

Why it's better than what we have right now?

funderburkjim commented 4 years ago

It would be better in the sense of making the code base more similar between mw and pw, pwg.

From the point of view of how displays behave, the user would see no difference.

funderburkjim commented 4 years ago

confirm hwparse.py template

When compare_template_v0.py was first used to validate that the v01/makotemplate/hwparse.py template generates the versions of hwparse.py for the individual dictionaries, the validation failed in numerous spots.

After several iterations, I finally rearranged some things so that the validation succeeds:

Use an adjusted template (`compare/template_hwparse_match.py)
Use an adjusted comparison program compare_template_v0_hwparse.py
- main difference: adjust the template for lines with python continuation lines, and make corresponding adjustment to the comparison program.
- Treat some differences as immaterial, such as Python comment lines that have different wording
- Make some minor adjustments in the individual file hwparse.py files
One potentially material change to several python hwparse.py files is to change the metaline variable 'hom' to 'h' (homonym). This was done for dictionaries yat, vcp, ap, shs, bur, wil, sch, skd, ap90, This is not currently material, since the digitizations for these dictionaries have no homonym markup; but hwparse is ready to handle homonym in the metaline if the digitizations are enhanced with homonym markup.

Difference between adjusted and original hwparse.py template

jfunderb@dialog6$ diff template_hwparse_match.py ../makotemplates/hwparse.py
7d6
< %if False:
9d7
< %endif
18,21c16,18
< %if dictlo == 'mw':
<  # 'e' contains HX identifier for mw
<  hwrec_keys = ['L','pc','k1','k2','h'] +\xyz
<                ['type','LP','k1P'] +\xyz
---
>  if dictcode == 'mw':
>   hwrec_keys = ['L','pc','k1','k2','h'] +\
>                ['type','LP','k1P'] +\
23,25c20,22
< %else:
<  hwrec_keys = ['L','pc','k1','k2','h'] +\xyz
<                ['type','LP','k1P'] +\xyz
---
>  else:
>   hwrec_keys = ['L','pc','k1','k2','h'] +\
>                ['type','LP','k1P'] +\
27c24
< %endif
---
>
49d45
< %if False:
51,52d46
< %endif
<    print "HW_init ERROR: duplicate L-code=",self.L

Original hwparse.py template is ok to go with

Although different, the original v01/makotemplate/hwparse.py template (identical to the v00 version), my analysis of the above 'diff' concludes that the original template is functionally the same as the adjusted template.

funderburkjim commented 4 years ago

confirm hw.py template

Recall that there are 5 versions of hw.py.
An iterative process was used for confirmation of an hw.py template, similar to the process described above for confirmation of an hwparse.py template.

The template file was taken as template_hw_match.py. The final form of this template differs in several places from the original template (now saved in compare directory as template_hw_original.py).
A special-purpose comparison program, compare_template_v0_hw.py, was used to compare, for each dictionary, the version of hw.py generated by the template to the version of hw.py in the dictionary's pywork directory.
Iterative changes were made to both the template (template_hw_match.py) and the individual dictionary pywork/hw.py programs. I believe these changes are
- non-material (i.e., will make no change when the program is run as part of redo_hw.sh)
- are sometimes correcting cosmetic errors. The most notable cosmetic error is the use of 'hom' instead of 'h' as the name of a meta-line variable.

My conclusion is that now the template for hw.py generates the current pywork/hw.py programs for all the dictionaries.

comment on Python 3 --TODO

In the original template, @drdhaval2785 also introduced some changes (notably with 'print') so that the template would generate a program appropriate for both Python2 and Python3. The current matching form of the template for hw.py does NOT include this change. But now that we can be sure the template correctly generates the Python2 versions, we can make further alterations (to makotemplates/hw.py) to re-insert Python3 compatibility.

funderburkjim commented 4 years ago

Confirm constant updateByLine.py

The version makotemplates/updateByLine.py is almost identical to the version MWScan/2014/pywork/updateByline.py. The only difference is that the makotemplates version has the Python2/3 compatibility statement from __future__ import print_function.

Comparing the MWscan version to the versions for other dictionaries, the only dictionaries whose updateByLine.py version has unexplained differences are those for ACC, MD, and WIL dictionaries. These latter do not contain the 'ins' and 'del' functionality of the MW version.

Conclusion: The makotemplate version is ok to use for all dictionaries.

Python2/3 compatibility

The makotemplate version is compatible with both Python2 and Python3.

Is `from future import print_function` needed?

In our usage, I think it is NOT needed, but is optional.

Python 2/3 compatibility does require that we change Python print expressions, like print x,y,z to the form print(x,y,z). But this expression gives slightly different output when run under Python2 or 3. Example: Suppose print(1,2,3) is executed in:

Python 2. Result is (1, 2, 3)
Python 3. Result is 1 2 3.

However, Since we only use print statements for output (usually debug output) whose only use is programmer examination, the minor differences in the output by Python 2/3 is immaterial to us.

If the from __future__ import print_function statement is included, then print(1,2,3) yields the same result with both Python2 and 3. Proof: enter the program temp_print.py:

from __future__ import print_function
print(1,2,3)

Whether run with python2 or python3, the terminal output is the same: 1 2 3.

Conclusion is that it is slightly better for us to include the from __future__ import print_function statement` in programs that might be run under Python2 or Python3.

funderburkjim commented 4 years ago

Final conclusions of this comment

All the eight programs in the makotemplates inventory correctly generate dictionary versions.

templates: hwparse.py, hw.py, redo_hw.sh, and redo_xml.sh
constants: hw0.py, hw2.py, parseheadline.py, updateByLine.py

Python 2/3 compatibility

All the makotemplates python programs use print(x,y,z) form, so should be ready for Python2 or 3.

Could the templates be made into constants?

I think the answer is 'yes'. The only template parameter used in the template files is dictlo.

In hw.py, the distinction provided by dictlo could be removed entirely, while retaining functional equivalence.

In the other 3, the dictlo distinction is probably needed.

drdhaval2785 commented 3 years ago

Could the templates be made into constants?

@funderburkjim, Does this issue survive or closable?

funderburkjim commented 3 years ago

This 'templates and constants' issue in csl-pywork is closable.

sanskrit-lexicon / csl-pywork