Closed funderburkjim closed 3 years ago
The v00 inventory provides a preliminary list of files classified as constants
(C) or templates
(T). The files mentioned in the inventory appear in the v00/makotemplates directory. The first task is to refine this classification and run tests to determine that:
For this purpose, there is a compare
subdirectory of v01. The following comments are based on work described in the compare/readme.org file.
We begin with the 8 files currently in the inventory. These files, and their initial classification as C or T: hwparse.py:T hw.py:T redo_hw.sh:T redo_xml.sh:T hw0.py:C hw2.py:C parseheadline.py:C updateByLine.py:C
A useful first analysis is to see, for each inventory file, how many versions of the appear within the individual dictionary pywork directories. This comparison is done via compare/compare_scripts.sh script, and the aggregate results in compare_scripts.txt. Here is the summary:
1 version of pywork/hw0.py
1 version of pywork/hw2.py
1 version of pywork/parseheadline.py
36 versions of pywork/redo_hw.sh
36 versions of pywork/redo_xml.sh
36 versions of pywork/hwparse.py
5 version of pywork/hw.py
5 versions of pywork/updateByLine.py
From this we can confirm: hw0.py, hw2.py, parseheadline.py are constants. hwparse.py, redo_hw.sh, and redo_xml.sh are templates hw.py and updateByLine.py are uncertain.
For the templates and uncertain files by this classification, we need further work. For those which are surely templates, we need to confirm that the template in v01/makotemplates/ correctly generates the individual pywork versions. For the uncertain ones, further comparison of the (5) versions is needed.
The compare_template_v0.py
program takes a particular template (such as redo_hw.sh) and compares, for each dictionary:
The program reports whether the two scripts are identical, and if they differ, it reports the differences.
Using the v01/makotemplates/redo_hw.sh script as the template, the generated template script IS identical to the individual pywork version for each dictionary. This confirms that the template is as needed.
If one uses v00/makotemplates/redo_hw.sh as the template, then there are differences between generated script and pywork script. By comparing the v00 and v01 versions of the template, however, we see that the differences occur only in insignificant places (within 'echo' strings or within bash comments). Thus, the v00 template version is also functionally the same as the v01 template version.
Similarly, the compare_template_v0.py
program validates that the v01/makotemplates/redo_xml.sh template generates all the individual dictionary versions of redo_xml.sh.
Note: For the full confirmation it was necessary:
<ls>BHĀG.P.</ls>
in pw0.xml becomes <ls n="1031">BHĀG.P.</ls>
in pw.xml.
This adjustment makes it easier for the display programs to generate tool tips.
The second step for pwg is completely analogous to the step for pw.<ls>
tags. Alter the 'ls' processing for mw to be like that of pw, pwg.
Alter the 'ls' processing for mw to be like that of pw, pwg.
Why it's better than what we have right now?
It would be better in the sense of making the code base more similar between mw and pw, pwg.
From the point of view of how displays behave, the user would see no difference.
When compare_template_v0.py
was first used to validate that the v01/makotemplate/hwparse.py template generates the versions of hwparse.py for the individual dictionaries, the validation failed in numerous spots.
After several iterations, I finally rearranged some things so that the validation succeeds:
compare_template_v0_hwparse.py
jfunderb@dialog6$ diff template_hwparse_match.py ../makotemplates/hwparse.py
7d6
< %if False:
9d7
< %endif
18,21c16,18
< %if dictlo == 'mw':
< # 'e' contains HX identifier for mw
< hwrec_keys = ['L','pc','k1','k2','h'] +\xyz
< ['type','LP','k1P'] +\xyz
---
> if dictcode == 'mw':
> hwrec_keys = ['L','pc','k1','k2','h'] +\
> ['type','LP','k1P'] +\
23,25c20,22
< %else:
< hwrec_keys = ['L','pc','k1','k2','h'] +\xyz
< ['type','LP','k1P'] +\xyz
---
> else:
> hwrec_keys = ['L','pc','k1','k2','h'] +\
> ['type','LP','k1P'] +\
27c24
< %endif
---
>
49d45
< %if False:
51,52d46
< %endif
< print "HW_init ERROR: duplicate L-code=",self.L
Although different, the original v01/makotemplate/hwparse.py template (identical to the v00 version), my analysis of the above 'diff' concludes that the original template is functionally the same as the adjusted template.
Recall that there are 5 versions of hw.py.
An iterative process was used for confirmation of an hw.py template, similar to the process described above for confirmation of an hwparse.py template.
My conclusion is that now the template for hw.py generates the current pywork/hw.py programs for all the dictionaries.
In the original template, @drdhaval2785 also introduced some changes (notably with 'print') so that the template would generate a program appropriate for both Python2 and Python3. The current matching form of the template for hw.py does NOT include this change. But now that we can be sure the template correctly generates the Python2 versions, we can make further alterations (to makotemplates/hw.py) to re-insert Python3 compatibility.
The version makotemplates/updateByLine.py is almost identical to the version MWScan/2014/pywork/updateByline.py. The only difference is that the makotemplates version has the Python2/3 compatibility statement from __future__ import print_function
.
Comparing the MWscan version to the versions for other dictionaries, the only dictionaries whose updateByLine.py version has unexplained differences are those for ACC, MD, and WIL dictionaries. These latter do not contain the 'ins' and 'del' functionality of the MW version.
Conclusion: The makotemplate version is ok to use for all dictionaries.
The makotemplate version is compatible with both Python2 and Python3.
from __future__ import print_function
needed?In our usage, I think it is NOT needed, but is optional.
Python 2/3 compatibility does require that we change Python print expressions, like print x,y,z
to the form print(x,y,z)
. But this expression gives slightly different output when run under Python2 or 3.
Example: Suppose print(1,2,3)
is executed in:
(1, 2, 3)
1 2 3
.However, Since we only use print statements for output (usually debug output) whose only use is programmer examination, the minor differences in the output by Python 2/3 is immaterial to us.
If the from __future__ import print_function
statement is included, then print(1,2,3)
yields the
same result with both Python2 and 3.
Proof: enter the program temp_print.py
:
from __future__ import print_function
print(1,2,3)
Whether run with python2 or python3, the terminal output is the same: 1 2 3
.
Conclusion is that it is slightly better for us to include the from __future__ import print_function
statement` in programs that might be run under Python2 or Python3.
All the eight programs in the makotemplates inventory correctly generate dictionary versions.
All the makotemplates python programs use print(x,y,z)
form, so should be ready for Python2 or 3.
I think the answer is 'yes'. The only template parameter used in the template files is dictlo
.
In hw.py, the distinction provided by dictlo
could be removed entirely, while retaining functional equivalence.
In the other 3, the dictlo
distinction is probably needed.
Could the templates be made into constants?
@funderburkjim, Does this issue survive or closable?
This 'templates and constants' issue in csl-pywork is closable.
This comment begins my review and extension of the initial classification work for the modules of pywork.
The idea of csl-pywork is similar to that of csl-websanlexicon. The repository should contain programs and data files required to regenerate the pywork directories for the individual dictionaries. Thus, the individual pywork directories can be managed in the central location of csl-pywork.
This particular work is being done in a 'v01' directory, which currently is NOT part of the repository; it resides only in the Cologne server. When the dust settles, it is anticipated that 'v01' will be added to the repository.
The files managed by csl-pywork are classified as either:
constants
-- files that are identical for each dictionary.templates
-- files that are different for each dictionary, but whose differences can easily be captured by a few parameters in a template file. When the values of the template parameters are substituted into the template, the resulting rendered template is what is needed for the particular dictionary's pywork directory.idiosycratics
-- these files are quite different for each dictionary. For example, the abbreviation expansions would be such a file (for those dictionaries which have abbreviation markup).This comment pertains only to the
constants
andtemplates
. Theidiosyncratics
will be discussed in other issues.computed pywork files
An example of a computed file would be xxx.xml, the xml version of a dictionary digitization. We do not contemplate inclusion of computed files as part of this csl-pywork repository. Rather, full initialization (or updating) of an individual pywork directory will be done in two steps: