weblicht / wlfxb

handler for TCF WebLicht data exchange format
GNU Lesser General Public License v3.0
2 stars 5 forks source link

Reading/writing through WLData does not preserve MetaData #2

Closed reckart closed 10 years ago

reckart commented 10 years ago

We loading a TCF file into a WLData object and saving it again as TCF, the contents of the MetaData section are lost.

It looks like wlfxb only supports some simple key/value metadata, but not the CMD metadata that weblicht stores there. Hence, e.g. the information with which tools a TCF file was processed is lost in the process.

yanapanchenko commented 10 years ago

This was already fixed in the development version recently, when we included CMD xml schema defenitions into TCF MetaData content definitions. Note that only the MetaData defined in the latest rnc schema will be preserved (see http://clarin-d.de/images/weblicht-tutorials/resources/tcf-04/schemas/latest/metadata_0_4.rnc). The fix will take effect in the next library release coming soon.

Additionally, the test is added to make sure MetaData contents (that conforms to current TCF rnc schema) are the same before and after reading from xml into WLData and writing back into xml.

yanapanchenko commented 10 years ago

Maybe important additional note: currently CMD metadata is only available for reading, not for writing

chozelinek commented 10 years ago

Hi both, I only attach one of the files where I observed this behaviour, just in case it can shed some light on this. So pre-webanno is the output from WebLicht, and there we can see the CMD element, and it should conform TCF specification. However, post-webanno contains a copy where some modifications were made and then exported. I expected that information contained in CMD element was at least kept.

http://fr46.uni-saarland.de/fileadmin/user_upload/lehrstuehle/Teich/jmartinez/tcf/ep3_pre-webanno.xml http://fr46.uni-saarland.de/fileadmin/user_upload/lehrstuehle/Teich/jmartinez/tcf/ep3_post-webanno.xml

yanapanchenko commented 10 years ago

Yes, this is the case for wlfxb <= 1.2.9. This won't be the case in 1.3.0, but I prefer to wait if the other issue is clarified before releasing this next version.