sanskrit-lexicon / csl-pywork

A template for creating pywork repository for each dictionary.
3 stars 1 forks source link

Broken bar #16

Closed drdhaval2785 closed 3 years ago

drdhaval2785 commented 4 years ago

When I tried to recreate the babylon files from current XML files, one change struck me in majority of dictionaries.

Broken bar is replaced by space (which was not the case earlier). This has resulted in double spacing (snp) or a space followed by comma (skd).

I feel the broken bar should be replaced by blank..

gasyoun commented 4 years ago

When I tried to recreate the babylon files from current XML files

How many times per year you do it?

drdhaval2785 commented 4 years ago

Last was 11 months back

funderburkjim commented 4 years ago

Any idea why this occurs?

Also, what is the source you use for xml files -- do you directly recreate them from the pywork/xxx.xml files?

drdhaval2785 commented 4 years ago

I guess why this happens.

dig_xml function in make_xml.py replaces the broken bar with space. Usually after the broken bar, there is a space already. When broken bar is replaced by space, we get two spaces consecutively in the XML.

drdhaval2785 commented 4 years ago

https://github.com/sanskrit-lexicon/csl-pywork/blob/f4dbde8ceaf4dd79e41c6ead7a0c7549bf2e5c89/v02/distinctfiles/snp/pywork/make_xml.py#L63 seems to be the line to be changed.

drdhaval2785 commented 4 years ago

I recreated xxx.xml by redo.sh from xxx.orig files.

funderburkjim commented 4 years ago

distinctifiles/make_xml.py NOT USED

When I was working on a change to make_xml.py yesterday for BUR, I began by making the change to distinctfiles/bur/pywork/make_xml.py . But when I regenerated bur, the change I made did not show up in the display! After some investigation, I noticed (in v02/inventory.txt) that make_xml.py is now a template:

; 10-11-2019: Changed make_xml.py from 'CD' to 'T'
*:pywork/make_xml.py:T

In other words, we started with the make_xml.py as distinctfiles, then (in October) I was able to get all the differences among dictionaries taken account in a template.

It is confusing to have those distinctfile make_xml.py programs -- we both were fooled by their presence.

Today I am going to change the names of all the distinctfile make_xml.py programs to unused_make_xml.py. Later, we can just delete the distinctfile unused_make_xml.py programs.

funderburkjim commented 4 years ago

suggest csl-stardict repository

@drdhaval2785 Currently, I don't know how to recreate the stardict files. I suspect that you have scripts on your local installation that deal with this.

It would be good to have this process in a repository. One suggestion would be to create a new sanskrit-lexicon/csl-stardict repository just for the purpose of regenerating the stardict files from Cologne data.

If your recreation process uses the cologne xxx.xml files generated by make_xml.py, then the redo.sh of csl-startdict could directly use ../xxx/pywork/xxx.xml files of local installation.

funderburkjim commented 4 years ago

need to change broken-bar logic in make_xml.py

From your comment above, I think you would like to avoid having two spaces that often occur in xxx.xml due to the handling of the broken bar when converting from xxx.txt.

I'll take a look at where this adjustment should be made, once you have the stardict regeneration up in a repository so I can recreate locally.

drdhaval2785 commented 3 years ago

https://github.com/sanskrit-lexicon/cologne-stardict is the repo you asked for @funderburkjim ?