Closed pirolen closed 2 years ago
Hmm, good question. The TEI looks simple enough but something must be different from the TEI files I have used until now. The problem with TEI is that it comes in so many flavours.
Actually... I see it now fails on one of my own test files too. There must be some regression in tei2folia itself, I'll investigate.
This is fixed and released now (v2.5.4).
I also implemented support for <c>
and <rs>
(wasn't implemented yet), but do note that the implementation for <rs>
does not map nicely to FoLiA <entity>
elements, so this probably doesn't do what you want yet. In TEI it's used more as markup so I mapped it to markup in FoLiA as well.
Awesome, thank you very much! Will test this conversion scenario in more detail.
I have a few toy TEI5 XML documents that include \<w> and \<c> elements, and annotations as \ elements.
tei2folia generates output from them, but the document body is empty.
What could be the reason? I am attaching the input/output docs.
The TEI was generated by INCEpTION. It uses the DKPro Core TEI reader / writer which supports a subset of TEI. The elements are listed here: https://dkpro.github.io/dkpro-core/releases/2.2.0/docs/format-reference.html#format-Tei
N.B. I randomly chose a TEI validation method: https://trafilatura.readthedocs.io/en/latest/tutorial2.html and the file did not validate.
I understand from the developers that the TEI reader / writer were developed using various TEI files from different sources as test material. If one has particular problems with data not validating, one can report this as an issue in the INCEpTION or DKPro Core GitHub issue trackers.
FA-MBK-4-3_035245008_0019_abpproc_entries.inctei.folia.xml.txt FA-MBK-4-3_035245008_0019_abpproc_entries.inctei.xml.txt