sanskrit-lexicon / mw-dev

Development version of MW dictionary, to collaborate with Andhrabharati
1 stars 0 forks source link

Welcome Andhrabharati #1

Open drdhaval2785 opened 1 year ago

drdhaval2785 commented 1 year ago

Dear @Andhrabharati, You can upload your file of MW in the folder 'orig', maybe as mw_AB.txt file, and keep on updating the same in this repository. I will try to use this file and see to it that there is no errors in generating XML file and HTML file. This would ensure that our experiments are not breaking any production version of MW on DCSL.

@funderburkjim may add his views in this repository, as and when required.

Andhrabharati commented 1 year ago

Could you move/copy into this repo, the MW full review issues (3 nos. so far) that I had posted earlier, @drdhaval2785 ?

drdhaval2785 commented 1 year ago

Done. It was a learning for me. Learnt how to transfer issues from one repository to another in github.

Andhrabharati commented 1 year ago

What about the issue-002 (though it is closed)?

drdhaval2785 commented 1 year ago

Shifted issue-002 also. It is in closed section.

Andhrabharati commented 1 year ago

Sorry, I missed it in a hurry!!

drdhaval2785 commented 1 year ago

Now you can put your MW file in the orig folder and push the changes on regular basis. I would look at the diff files and find out a way in which your changes can be incorporated in XML and HTML files.

Andhrabharati commented 1 year ago

My first push would be after 2-3 days, as I am now at some 'major' change as suggested by @funderburkjim (wrt the grouped entries).

drdhaval2785 commented 1 year ago

Ok

Andhrabharati commented 1 year ago

I could not spend much time as estimated, as my mother is not well for past some time and I had to be with her full-time.

So thought of uploading the work I did as of now (about 80% done as per my plan), so that @funderburkjim and @drdhaval2785 could have a look at it and see how to proceed further. [I guess, they would easily get an idea of corrections by casual browsing through the file.]

Just like to mention that this is just reformatting and adjusting the data and correcting numerous errors on-the-go. Main corrections are--

  1. The duplicated entries (whether marked as groups or not in CDSL text) and the 'empty' lines without body portion are deleted.
  2. As done in the majority of the grouped entries, the HW and the 'qualifier' (including the lexical info and the other details, such as its etymology or the cited places) [whether individual or multiple, as a list] are brought together before the broken vertical-bar and meanings portion is kept after the symbol. I termed the split as Header portion and Body portion, and the division has made the data very appealing visually.

I will be updating the file, as and when I find some free time.

Once the adjustments are over, the proofing work would be started.

Andhrabharati commented 1 year ago

@drdhaval2785

I could not push the mw_AB.txt file to the repo, as I do not have write-access there.

Github Desktop is asking whether I would like to fork the repo to continue.

Shall I post the file here or wait for write-permission for me at the repo?

drdhaval2785 commented 1 year ago

You have access now to this repository. I think you would be able to push now.

Andhrabharati commented 1 year ago

Thanks, @drdhaval2785 !

Just pushed the file, and I will be awaiting the feedback on it.

Andhrabharati commented 1 year ago

BTW, I have another (and probably, a better and simpler) way of 'treating' the 'list' items [now I feel, they should not be called a 'group'!!] than what was proposed (preliminarily) by @funderburkjim.

[Would like to wait before he 'freezes' his idea, before talking about my idea.]

funderburkjim commented 1 year ago

Glad to see mw_AB.txt ! Will begin study of it in coming days.

Do I need to consider temp_mw_01_iast.txt or temp_mw_01_iast_plain_AB.txt ?

Regarding 'list' ('group') items:

Andhrabharati commented 1 year ago

The plain_AB and info_tags files together make the iast file.

As such, my current file took off from the plain_AB file.

Yes, the groups are OR, ORSL, AND and comma separated entries.

Andhrabharati commented 1 year ago

Another aspect I considered is removing the ABCE tagged e-key metalines and their associated LEND lines.

These removals amounted to over 200k lines out of 860k lines, nearly one-quarter of them!!

Andhrabharati commented 1 year ago

I forgot to replace back the cdsl transliteration letters, which I had changed to tally with the print 'as is' (to ease my working).

These replacements are ṛi > ṛ Ṛi > Ṛ ṛī > ṝ ṡ > ś Ṡ > Ś ṣh > ṣ Ṣh > Ṣ

Probably, I should do this at my end to ease the mw-dev work at CDSL. I shall make it in a batch operation before my pushing henceforth.

Just pushed again the latest file with these conversions done