sanskrit-lexicon / csl-pywork

A template for creating pywork repository for each dictionary.
3 stars 1 forks source link

templates and constants #1

Closed funderburkjim closed 3 years ago

funderburkjim commented 4 years ago

This comment begins my review and extension of the initial classification work for the modules of pywork.

The idea of csl-pywork is similar to that of csl-websanlexicon. The repository should contain programs and data files required to regenerate the pywork directories for the individual dictionaries. Thus, the individual pywork directories can be managed in the central location of csl-pywork.

This particular work is being done in a 'v01' directory, which currently is NOT part of the repository; it resides only in the Cologne server. When the dust settles, it is anticipated that 'v01' will be added to the repository.

The files managed by csl-pywork are classified as either:

This comment pertains only to the constants and templates. The idiosyncratics will be discussed in other issues.

computed pywork files

An example of a computed file would be xxx.xml, the xml version of a dictionary digitization. We do not contemplate inclusion of computed files as part of this csl-pywork repository. Rather, full initialization (or updating) of an individual pywork directory will be done in two steps:

funderburkjim commented 4 years ago

The v00 inventory provides a preliminary list of files classified as constants (C) or templates (T). The files mentioned in the inventory appear in the v00/makotemplates directory. The first task is to refine this classification and run tests to determine that:

For this purpose, there is a compare subdirectory of v01. The following comments are based on work described in the compare/readme.org file.

We begin with the 8 files currently in the inventory. These files, and their initial classification as C or T: hwparse.py:T hw.py:T redo_hw.sh:T redo_xml.sh:T hw0.py:C hw2.py:C parseheadline.py:C updateByLine.py:C

funderburkjim commented 4 years ago

number of versions

A useful first analysis is to see, for each inventory file, how many versions of the appear within the individual dictionary pywork directories. This comparison is done via compare/compare_scripts.sh script, and the aggregate results in compare_scripts.txt. Here is the summary:

  1 version of pywork/hw0.py
  1 version of pywork/hw2.py
  1 version of pywork/parseheadline.py
 36 versions of pywork/redo_hw.sh
 36 versions of pywork/redo_xml.sh
 36 versions of pywork/hwparse.py
  5 version of pywork/hw.py
  5 versions of pywork/updateByLine.py

From this we can confirm: hw0.py, hw2.py, parseheadline.py are constants. hwparse.py, redo_hw.sh, and redo_xml.sh are templates hw.py and updateByLine.py are uncertain.

For the templates and uncertain files by this classification, we need further work. For those which are surely templates, we need to confirm that the template in v01/makotemplates/ correctly generates the individual pywork versions. For the uncertain ones, further comparison of the (5) versions is needed.

funderburkjim commented 4 years ago

confirm redo_hw.sh template

The compare_template_v0.py program takes a particular template (such as redo_hw.sh) and compares, for each dictionary:

The program reports whether the two scripts are identical, and if they differ, it reports the differences.

Using the v01/makotemplates/redo_hw.sh script as the template, the generated template script IS identical to the individual pywork version for each dictionary. This confirms that the template is as needed.

Note

If one uses v00/makotemplates/redo_hw.sh as the template, then there are differences between generated script and pywork script. By comparing the v00 and v01 versions of the template, however, we see that the differences occur only in insignificant places (within 'echo' strings or within bash comments). Thus, the v00 template version is also functionally the same as the v01 template version.

funderburkjim commented 4 years ago

confirm redo_xml.sh template

Similarly, the compare_template_v0.py program validates that the v01/makotemplates/redo_xml.sh template generates all the individual dictionary versions of redo_xml.sh.

Note: For the full confirmation it was necessary:

Enhancement suggestion:

Alter the 'ls' processing for mw to be like that of pw, pwg.

gasyoun commented 4 years ago

Alter the 'ls' processing for mw to be like that of pw, pwg.

Why it's better than what we have right now?

funderburkjim commented 4 years ago

It would be better in the sense of making the code base more similar between mw and pw, pwg.

From the point of view of how displays behave, the user would see no difference.

funderburkjim commented 4 years ago

confirm hwparse.py template

When compare_template_v0.py was first used to validate that the v01/makotemplate/hwparse.py template generates the versions of hwparse.py for the individual dictionaries, the validation failed in numerous spots.

After several iterations, I finally rearranged some things so that the validation succeeds:

Difference between adjusted and original hwparse.py template

jfunderb@dialog6$ diff template_hwparse_match.py ../makotemplates/hwparse.py
7d6
< %if False:
9d7
< %endif
18,21c16,18
< %if dictlo == 'mw':
<  # 'e' contains HX identifier for mw
<  hwrec_keys = ['L','pc','k1','k2','h'] +\xyz
<                ['type','LP','k1P'] +\xyz
---
>  if dictcode == 'mw':
>   hwrec_keys = ['L','pc','k1','k2','h'] +\
>                ['type','LP','k1P'] +\
23,25c20,22
< %else:
<  hwrec_keys = ['L','pc','k1','k2','h'] +\xyz
<                ['type','LP','k1P'] +\xyz
---
>  else:
>   hwrec_keys = ['L','pc','k1','k2','h'] +\
>                ['type','LP','k1P'] +\
27c24
< %endif
---
>
49d45
< %if False:
51,52d46
< %endif
<    print "HW_init ERROR: duplicate L-code=",self.L

Original hwparse.py template is ok to go with

Although different, the original v01/makotemplate/hwparse.py template (identical to the v00 version), my analysis of the above 'diff' concludes that the original template is functionally the same as the adjusted template.

funderburkjim commented 4 years ago

confirm hw.py template

Recall that there are 5 versions of hw.py.
An iterative process was used for confirmation of an hw.py template, similar to the process described above for confirmation of an hwparse.py template.

My conclusion is that now the template for hw.py generates the current pywork/hw.py programs for all the dictionaries.

comment on Python 3 --TODO

In the original template, @drdhaval2785 also introduced some changes (notably with 'print') so that the template would generate a program appropriate for both Python2 and Python3. The current matching form of the template for hw.py does NOT include this change. But now that we can be sure the template correctly generates the Python2 versions, we can make further alterations (to makotemplates/hw.py) to re-insert Python3 compatibility.

funderburkjim commented 4 years ago

Confirm constant updateByLine.py

The version makotemplates/updateByLine.py is almost identical to the version MWScan/2014/pywork/updateByline.py. The only difference is that the makotemplates version has the Python2/3 compatibility statement from __future__ import print_function.

Comparing the MWscan version to the versions for other dictionaries, the only dictionaries whose updateByLine.py version has unexplained differences are those for ACC, MD, and WIL dictionaries. These latter do not contain the 'ins' and 'del' functionality of the MW version.

Conclusion: The makotemplate version is ok to use for all dictionaries.

Python2/3 compatibility

The makotemplate version is compatible with both Python2 and Python3.

Is from __future__ import print_function needed?

In our usage, I think it is NOT needed, but is optional.

Python 2/3 compatibility does require that we change Python print expressions, like print x,y,z to the form print(x,y,z). But this expression gives slightly different output when run under Python2 or 3. Example: Suppose print(1,2,3) is executed in:

However, Since we only use print statements for output (usually debug output) whose only use is programmer examination, the minor differences in the output by Python 2/3 is immaterial to us.

If the from __future__ import print_function statement is included, then print(1,2,3) yields the same result with both Python2 and 3. Proof: enter the program temp_print.py:

from __future__ import print_function
print(1,2,3)

Whether run with python2 or python3, the terminal output is the same: 1 2 3.

Conclusion is that it is slightly better for us to include the from __future__ import print_function statement` in programs that might be run under Python2 or Python3.

funderburkjim commented 4 years ago

Final conclusions of this comment

All the eight programs in the makotemplates inventory correctly generate dictionary versions.

Python 2/3 compatibility

All the makotemplates python programs use print(x,y,z) form, so should be ready for Python2 or 3.

Could the templates be made into constants?

I think the answer is 'yes'. The only template parameter used in the template files is dictlo.

In hw.py, the distinction provided by dictlo could be removed entirely, while retaining functional equivalence.

In the other 3, the dictlo distinction is probably needed.

drdhaval2785 commented 3 years ago

Could the templates be made into constants?

@funderburkjim, Does this issue survive or closable?

funderburkjim commented 3 years ago

This 'templates and constants' issue in csl-pywork is closable.