Closed funderburkjim closed 7 years ago
183 matches - And a quick glance at the results suggests there are no false positives.
Well done, a big catch. You meant {%te\W
and {%ti\W
, right?
{%ti\W
Good suggestion - 32 of those are found.
We'll need to do some individual examination to be sure the missing '-' is not present at the end of
the preceding line, for instance - this would be a case where we would not want to change to {%-ti
.
<P>.{#aByund#}¦ {%abhy-und (abhi-und),%} cl. 7. P. {%-unat-%}
<>{%ti, -unditum,%} to wet, bedew; flow over.
Probably a display showing current and preceding line would suffice. and we could look for a
-%}
at end of prior line, and, if found, presume no '-' needed on the {%ti
or {%te
.
Probably a display showing current and preceding line would suffice. and we could look for a -%} at end of prior line, and, if found, presume no '-' needed on the {%ti or {%te.
Or change double to one if put all in 1 line.
@SergeA Have you noticed any other patterns?
Also MW72 gives a great quantity of verbforms for prefixed verbs, replacing their prefix with a hyphen. As in
vi-bhram, cl. 1. 4. P. -bhramati,
bhrāmyati, -bhramitum
I suppose in the scheme
prefix-root ... ... .... -form1,
form2, -form3 ...
for "form2" in 100% should be "-form2" However, the preceding element "-form1," and the following "-form3" are not always present, in which case the probability of lost hyphen lowers.
I've addressed the simplest case , for ti,te. These have been autocorrected; I've only examined directly a small number of randomly selected cases, but feel fairly sure all these corrections are warranted.
In fact, all these corrections have been installed.
There are 189 cases of corrections. Here is the file, which shows the correction, as well as the preceding line. Here are the corrections: filter.txt
Autocorrections have also been generated for the 'prefix-root' cases mentioned above.
Two files have been prepared:
The corrections in filterpv.txt have not yet been installed. They need further examination before being installed.
The two files are in this gist
1153 cases
A lot, indeed.
1153 cases
A lot, indeed.
A lot in number, but it is a very easy task. They mostly even do not need to recheck by PDF and can be solved on the fly by context, few seconds for each case.
it is very easy task
Agreed. No need to consult scan usually.
If you do some checking, why don't you start 'at the top', and I'll start 'at the bottom' tomorrow.
If you find some False Positives in filterpv.txt, just note them and add the bunch to comment here. I'll do the same.
If you find some False Positives in filterpv.txt, just note them and add the bunch to comment here.
https://docs.google.com/document/d/10Ivo95hD75xHVcnQ8RgJF9oKge3gq06e2cyzveQeqVI/edit?usp=sharing
Here are few false positives from the bigger list. The other one has more complicated cases, it'll better be viewed though interface, I think.
I went through the filterpv_no.txt and gathered 29 false negatives
The records to be corrected are
All the above now installed.
Time to close this issue.
One aside re MW72: There appear to be more verb forms in MW72 than in MW(99). Due to the simpler form of mw72.txt relative to mw.xml, it would likely be easier (though not easy) to harvest the verb forms from mw72 than it would be to harvest the forms from mw.xml.
The resulting list, from either source,
could provide a useful digital reference (in addition to Whitney Roots) of verb forms.
One use of such a list would be in comparison to algorithmic computations of verb forms.
@SergeA mentions in a comment to case 19 of batch 314 (see #322):
The current filter on which these batches are based does NOT catch these cases, although a few cases had this error coincidentally.
I'm not sure of what programmable pattern would be required to catch more of these cases of missing '-' at the beginning of a line.
One such pattern is
{%te\W
(te
at beginning of an italic (and therefore Sanskrit in MW72), followed by a non-word character. A quick search shows 183 matches - And a quick glance at the results suggests there are no false positives.There may be other patterns, which might involve a prior line.
@SergeA Have you noticed any other patterns?