Closed GoogleCodeExporter closed 9 years ago
Hi. If you access the etymologies using IWiktionaryEntry.getWordEtymology(),
you will obtain a IWikiString representation of the etymology. This class
provides both a getPlainText() and a getText() method to obtain a string
representation of the etymology. I assume that in your code you used the former
(or an implicit toString(), which uses getPlainText(), too). The latter,
however, allows you to work with the full markup encoded in Wiktionary. And
yes: getPlainText() is too eager for etymology strings. I'm not sure if a plain
text representation is necessary at all if you have the markup version. I have
been experimenting with a EtymologyTemplateHandler for a while - you can find
it in the api.util.TemplateParser file - using this methodology, it should be
possible to analyze etymology strings. It's far from perfect, but probably a
good starting point. If you make interesting changes to the JWKTL source code,
I'm happy to integrate it. Just reopen this ticket or start a new one. Best
wishes!
Original comment by chmeyer.de
on 4 Feb 2015 at 3:07
Thanks Christian! I wasn't planning to make changes, I was just trying to
see how much of the info in WIktionary is formalized. I am allergic to
databases :)
all the best,
Vivi
Original comment by nast...@fbk.eu
on 4 Feb 2015 at 3:16
Original issue reported on code.google.com by
nast...@fbk.eu
on 26 Jan 2015 at 11:28