Open hgiesel opened 10 months ago
hey Henrik, this is a toughie. de-conj results are actually generated by a script, somewhere inside wiktionary. They auto-create easier conjugations, and allow users to set exceptions. Conjugating german verbs is beyond the scope of wtf_wikipedia, but could be a candidate for a plugin. You can see we're generating conjugations at de-compromise, if that's what your looking for. cheers
I don't mean that it should generate the actual results. I actually intend to do that myself.
I mean that if I parse the Wiktionary page with wtf
, it seems like it drops some parts from the document.
After parsing the page, the I want to have this text: ab.tun<irreg>
, however it mutilates it to this: ab.tun
, and skips the <irreg>
part.
ahh, ya. I see what you mean. First- that sounds cool that you're reproducing the results. Please share-back what you can.
Yeah, as you suspected, it's the angle-brackets. The library involves a lot of xml tags, which by default, pass-through. This also runs before the template parser.
It would be easy to support <irreg>
but i'm just looking at the template doc, and see things like this:
{{de-conj|schwimmen<schwamm:schwomm[archaic; used up through the 19th century],geschwommen,schwämme:schwömme[rare]>}}
So yikes, I didn't know about this syntax. I'm not sure how to do it, to be honest. You may find there is a solution somewhere in the kill_xml file - but I can't think of one right now. cheers
@hgiesel - the wiki templates parsing for Wiktionary pages needs some love keep in mind... (not just for the irregg). As there is lot's of issue's in regards to proper parsing - tho I do believe @spencermountain is already aware of this.
If you happen to play with the English Wiktionary you will run into a lot of issues with improper parsing. That said - you are more than welcome to contribute any fixes you find - as I know @spencermountain is a very busy guy.
I've been trying to parse Wiktionary pages like this one However
wtf
does not parse the template string correctly: It fails to read{{de-conj|ab.tun<irreg>}}
and skips the<irreg>
part.Another template it fails to parse correctly is on this page.
This:
is turned by wtf into: