Open berndmoos opened 6 years ago
Should be text/tcf+xml now.
Waiting for the change to take effect... Stay tuned.
The change does not seem to be recognised by WebLicht. The monitoring page says that "7 services were retained" at the last harvest (https://weblicht.sfs.uni-tuebingen.de/harvester/resources/report). I suspect some action has to be taken so that the services are updated instead of just retained.
Asked a question on the list...
The output mime type is changed now...
... and it is the same as for other services with TCF as an output...
... but WebLicht still does not offer other services with TCF as input.
I guess the TCF converter is somehow underspecified in the CMDI. We will maybe need to add lang etc., see http://weblicht.sfs.uni-tuebingen.de/comet/editor.jsp?id=1541449788338
The links have expired, I added lang parameter de but I didn't force re-indexing yet
This one should be a model for specifying the output parameters:
http://weblicht.sfs.uni-tuebingen.de/fedora/objects/WLWS:3/datastreams/CMDI/content
Ok, I copypasted that for a test
I think it would be more efficient if HZSK could test the changes directly. Here's a recipe for testing:
(0) Modify CMDI and wait until WebLicht has harvested it (should take around 2h according to Tübingen) (1) Go to WebLicht at https://weblicht.sfs.uni-tuebingen.de/ (2) Start, login, start (3) Choose "Upload a file" and pick an EXMARaLDA Basic Transcription (*.exb) - I use RudiVoellerWutausbruch.exb (4) Pick the appropriate segmentation algorithm and language - in my case: "hiat" and "deutsch" (5) check "Show tools with status: development" (6) Add service "IDS, HZSK: EXMARaLDA to ISO/TEI converter" to the chain (7) Add service "IDS, HZSK: ISO/TEI to TCF" to the chain
What we want is that WebLicht then offers TCF-based services for the next step. Currently, no services are offered.
Excellent idea, I've played around a bit now, I think it might be the language thing but I still can't get the languages to work around, like with other chains the boxes will contain languages but here it just goes from deutsch to nothing to unknown, even though I copied the input and output parametres, I will continue experimenting...
The language didn't fix it (alone) but adding version or "text" did,
Better, but not quite there yet. What WebLicht now offers is a bunch of tokenizers, although the TCF is already tokenized. We'll probably have to add "sentences" and "tokens" to the output as well...
Now it's text sentences tokens and IMS morphology works at least for a trivial small file.
It also works for my favourite test files, so I'd venture to say, this issue can be closed. However, there is a similar issue in the mirror operation, so I am opening a mirror issue: issue #10
... and this makes it impossible to really use the TCF services on converted ISO/TEI data.
I suspect the reason is the mime type. The metadata for isotei2tcf (https://corpora.uni-hamburg.de/hzsk/de/islandora/object/webservice:isotei2tcfconverter-0.9/datastream/CMDI) specifies the following as output:
application/xml;format-variant=weblicht-tcf
This is what we wanted, but didn't get (see issue#6). In https://github.com/hzsk/HZSK-CLARIN-Services/blob/2bc7e9ee2f4c5de79a8401a6f2f4cb76b4ee6839/src/main/java/de/uni_hamburg/converters/IsoTeiConverter.java#L287, @Produces is given as:
text/tcf+xml
I think this is what the metadata should use as mime type for the output. Can somebody change that?
Likewise, in...
https://corpora.uni-hamburg.de/hzsk/de/islandora/object/webservice:tcf2isoteiconverter-0.9/datastream/CMDI
... the input mime type should change.