Closed helmadik closed 2 years ago
Hi Helma, sorry for the slow reply. I'm back from my holidays and am doing some work on Diogenes now. I'm not sure what you mean by "swap out lemmata". What were you trying to do? Here are some points that might help:
My guess is that you are trying to remove spurious parses from the analysis of inflected forms (because you hate that kind of thing :-). If so, the route forward is not to edit the file greek-analyses.txt, which is automatically generated from several sources and might change again in the future. For example, when I swapped the Perseus version of LSJ for the Logeion version, all of the lemmata offsets changed and I just regenerated that file with the new lemma numbers.
The right way to go would be to fix the errors at their source, which is in the file build/grc.morph, which is the output of running Morpheus (the old Perseus tagger) on a list of words from the TLG. That file is not in git because it is automatically generated and it is not distributed with Diogenes because it is not used by the application; it is only used in the process of generating greek-analyses.txt.
I don't know if you have a working copy of Morpheus, but if not I can send you a copy of grc.morph. To give you an idea, the format of the file is like this:
sumpi/tnei
<NL>V sumpi/tnw pres ind mp 2nd sg poetic w_stem</NL><NL>V sumpi/tnw pres ind act 3rd sg poetic w_stem</NL>
sumpi/tnousin
<NL>P sumpi/tnw pres part act masc/neut dat pl attic epic doric ionic nu_movable poetic w_stem</NL><NL>V sumpi/tnw pres ind act 3rd pl attic epic doric ionic nu_movable poetic w_stem</NL>
sumpiw/n
Inflected form on one line and then parses on the next line within <NL>
tags. If you would be interested in contributing a patch to fix this file, I'm sure all Diogenes users would be very grateful!
Alternatively, since you presumably have already done this work for Logeion, I wonder if there might be a way for me to import all of your corrections into this file automatically, if you would be willing to share your data. That would be a much better way to do it than to fiddle around with manual corrections!
I was looking at the Greek analyses and Greek lemmata. I can of course remove single analyses wholesale, if a more sensical one shows up after the first one, but I haven't figured out how to swap out lemmata. Where does the lemma number, e.g. 100546453 for sumpi/tnw, get baked in? Any way to handle this? What are the files I'm failing to look at? Pointers appreciated!