standardebooks / tools

The Standard Ebooks toolset for producing our ebook files.
Other
1.43k stars 127 forks source link

Add s-104 and t-076 for possible <dfn> candidates in endnotes #730

Open apasel422 opened 4 months ago

acabal commented 4 months ago

Great, thanks. I'm more interested in seeing if the xpath I proposed can be improved at all. Have you tried running it on the corpus to see what comes up? I only did a cursory investigation.

We want to see if we can craft an xpath that can return the most correct matches, with the fewest false positives. How many false positives is too many is a matter of taste, we just have to feel it out to see. The xpath I proposed on the list is just a first draft and I imagine it could be improved significantly.

apasel422 commented 4 months ago

Great, thanks. I'm more interested in seeing if the xpath I proposed can be improved at all. Have you tried running it on the corpus to see what comes up? I only did a cursory investigation.

I'll post my findings here later.