Closed gasyoun closed 7 months ago
Please provide a pictorial example of 'capital letters (for those words, that have them in the book)' .
Did I understood the question right?
Ok, good. I misunderstood.
These 'N.' should be searchable in the mw.xml via
<ab>N.</ab>
If you wanted to generate a list of all the MW headwords which are identified as the Name of someone or something, you could
1) Search for all records in mw.xml which have N. as abbreviation
2) For each such record, pluck out key1 contents by a regex match; i.e. pluck out 'arcanas' from
<key1>arcanas</key1>
3) If you wanted to express 'arcanas' in IAST, you would pass it through transcoder (slp1 => roman),
4) If you further wanted the IAST to be capitalized, you essentially have to call an upper casing function. This might be tricky if the first letter of the IAST had a diacritic, such as for slp1 AdAnI.
If this what you are after?
I guess so, thanks. That's why I want to kill SLP1 as quick as I can, otherwise I'll be in trouble with AdAnI. Could you please make a video about the transcoder?
I made a sample php application , and uploaded here: See https://github.com/sanskrit-lexicon/MWS/tree/master/transcodeExample
You can see the result in extract_N.txt file. The capitalization of IAST looks to be done properly, thanks to a php routine found on PHP web site.
To run this on your local machine:
(a) Download the transcodeExample folder
(b) Get a copy of mw.xml from the xml download for MW at Cologne Sanskrit Lexicon. Put mw.xml into your local transcodeExample folder.
(c) Save a copy of extract_N.txt, to compare with your result.
(d) Run the following command in a terminal (e.g., cmd.exe) (assumes 'php' is Windows $PATH environment variable. Otherwise, replace it by appropriate c://Your_PATH/php.exe)
php extract_N.php mw.xml extract_N.txt
Does this answer the question?
Do you still need a video about transcoder?
https://github.com/sanskrit-lexicon/MWS/issues/5#issuecomment-56468168 823 cases, oh that's a better (smaller) number, that I had before. Good to know Mahabharata Cultural Index has around 3000 personal names, so it's not about 20 000 words anyway. I only wonder if N. has some variations and 823 could (should) be a bigger number after all.
I do not need a video anymore, only need to find the lost path now, sure, your explanations are detailed enough. The demo .php files sure answers. I wonder if I can use on other .xml files as well - I guess so. Can it have the L number beside as well, so I know who is who after some mixing, thanks?
What helped me was https://community.apachefriends.org/f/viewtopic.php?p=193850&sid=750f8e0d1389832ed90254806b250084#p193723 cd E:\xampp\php because only now php.exe -v worked and not as described at http://stackoverflow.com/questions/5650774/changed-windows-path-but-still-getting-php-exe-is-not-recognized-errormessage It was '"php.exe" is not recognized as an internal or external command, operable program or batch file.' just half an hour ago.
But I could not launch the code anyway. Even tried in Chrome http://localhost/transcodeExample/extract_N.php, but got
Notice: Undefined variable: argv in C:\xampp\htdocs\transcodeExample\extract_N.php on line 5
Notice: Undefined variable: argv in C:\xampp\htdocs\transcodeExample\extract_N.php on line 6
transcoder_set_dir change: dir = . old = C:\xampp\htdocs\transcodeExample/transcoder newdir = C:\xampp\htdocs\transcodeExample
Warning: fopen(): Filename cannot be empty in C:\xampp\htdocs\transcodeExample\extract_N.php on line 9
Cannot open
What worked: cd C:\xampp\htdocs\transcodeExample php.exe extract_L.php mw.xml extract_L.txt
@funderburkjim / @gasyoun,
Is this issue closable now?
How can I make a list in MW with words that appears only at the end of compounds. Will"ifc" help me (884 pieces)? Like -da, -pa, -ja, -ra (end of compounds). I would want to have a list of words like ज 76157 that change meaning when at the end or beginning of a wordafter_an ~ adv. ~or_adverbial_word
.A
n.A