Monier Williams Grammar Markup

gasyoun commented 10 years ago

How can I make a list in MW with words that appears only at the end of compounds. Will"ifc" help me (884 pieces)? Like -da, -pa, -ja, -ra (end of compounds). I would want to have a list of words like ज 76157 that change meaning when at the end or beginning of a word after_an~ adv.~or_adverbial_word

_born_or_produced or like अ 4 in the beginning - it's not an upasarga, still could be in my case called a prefixoid. I do not see no markup in the printed book and no markup in the digitization as well. And ifc.~ f

A

. looks messy in the code and bad on the web display as well. "mf(आ)n." is not we see it in the book. In printed book we have a mix of mf(ā)n. inside the text and द as header - right now we can only have all devananagari or all IAST kind of thing. Can we have a mode to look similar to the book? pakṣa is Pakṣa (wing, pinion) in printed - original anywhere stored? Are all capital letters lost? In which mode I can get the data about accents used in MW? I started exploring again and I see that MW has LEX (mfn., mf

A

n.) What's the difference between f. and f.? http://www.sanskrit-lexicon.uni-koeln.de/mwupdate/mwxmldownload/mwtags.html#lex First "aBavya" has mfn.. But there are 2 more, so can (I would say should) we add (invisibly for the eye, but understandable for the coding end decoding purposes) mfn. to 311530 and 311530.1? Have you ever tried to regex and find grammar info for every word? Right now what I miss in my manipulations with MW:

capital letters (for those words, that have them in the book)
accents
words breakings (h2?)
grammar data (mnf. kind of things for every single word) Any chance to get some advice? Thanks.

funderburkjim commented 10 years ago

Please provide a pictorial example of 'capital letters (for those words, that have them in the book)' .

gasyoun commented 10 years ago

Did I understood the question right?

arcatri

funderburkjim commented 10 years ago

Ok, good. I misunderstood.

These 'N.' should be searchable in the mw.xml via

<ab>N.</ab>

If you wanted to generate a list of all the MW headwords which are identified as the Name of someone or something, you could

1) Search for all records in mw.xml which have N. as abbreviation

2) For each such record, pluck out key1 contents by a regex match; i.e. pluck out 'arcanas' from

<key1>arcanas</key1>

3) If you wanted to express 'arcanas' in IAST, you would pass it through transcoder (slp1 => roman),

4) If you further wanted the IAST to be capitalized, you essentially have to call an upper casing function. This might be tricky if the first letter of the IAST had a diacritic, such as for slp1 AdAnI.

If this what you are after?

gasyoun commented 10 years ago

I guess so, thanks. That's why I want to kill SLP1 as quick as I can, otherwise I'll be in trouble with AdAnI. Could you please make a video about the transcoder?

funderburkjim commented 10 years ago

I made a sample php application , and uploaded here: See https://github.com/sanskrit-lexicon/MWS/tree/master/transcodeExample

You can see the result in extract_N.txt file. The capitalization of IAST looks to be done properly, thanks to a php routine found on PHP web site.

To run this on your local machine:

(a) Download the transcodeExample folder

(b) Get a copy of mw.xml from the xml download for MW at Cologne Sanskrit Lexicon. Put mw.xml into your local transcodeExample folder.

(c) Save a copy of extract_N.txt, to compare with your result.

(d) Run the following command in a terminal (e.g., cmd.exe) (assumes 'php' is Windows $PATH environment variable. Otherwise, replace it by appropriate c://Your_PATH/php.exe)

php extract_N.php mw.xml extract_N.txt

Does this answer the question?

Do you still need a video about transcoder?

gasyoun commented 10 years ago

https://github.com/sanskrit-lexicon/MWS/issues/5#issuecomment-56468168 823 cases, oh that's a better (smaller) number, that I had before. Good to know Mahabharata Cultural Index has around 3000 personal names, so it's not about 20 000 words anyway. I only wonder if N. has some variations and 823 could (should) be a bigger number after all.

I do not need a video anymore, only need to find the lost path now, sure, your explanations are detailed enough. The demo .php files sure answers. I wonder if I can use on other .xml files as well - I guess so. Can it have the L number beside as well, so I know who is who after some mixing, thanks?

cmd

What helped me was https://community.apachefriends.org/f/viewtopic.php?p=193850&sid=750f8e0d1389832ed90254806b250084#p193723 cd E:\xampp\php because only now php.exe -v worked and not as described at http://stackoverflow.com/questions/5650774/changed-windows-path-but-still-getting-php-exe-is-not-recognized-errormessage It was '"php.exe" is not recognized as an internal or external command, operable program or batch file.' just half an hour ago.

But I could not launch the code anyway. Even tried in Chrome http://localhost/transcodeExample/extract_N.php, but got

Notice: Undefined variable: argv in C:\xampp\htdocs\transcodeExample\extract_N.php on line 5 Notice: Undefined variable: argv in C:\xampp\htdocs\transcodeExample\extract_N.php on line 6 transcoder_set_dir change: dir = . old = C:\xampp\htdocs\transcodeExample/transcoder newdir = C:\xampp\htdocs\transcodeExample Warning: fopen(): Filename cannot be empty in C:\xampp\htdocs\transcodeExample\extract_N.php on line 9 Cannot open

What worked: cd C:\xampp\htdocs\transcodeExample php.exe extract_L.php mw.xml extract_L.txt

Andhrabharati commented 7 months ago

@funderburkjim / @gasyoun,

Is this issue closable now?

sanskrit-lexicon / MWS

Monier Williams Grammar Markup #5