Closed giuliabaldini closed 1 year ago
Ok, so here is what I did:
webanno.custom.Documentmetadata
<typeDescription>
<name>webanno.custom.Documentmetadata</name>
<description/>
<supertypeName>uima.cas.AnnotationBase</supertypeName>
</typeDescription>
In your description of the process, I don't see that you actually created a document metadata layer. Did you create one?
Never mind - I should have read the report mode closely. I see now that you are referring to a particular DKPro Core type that is missing.
Where did you get the type system from step 6 from (i.e. the one that contains the DKPro Core DocumentMetaData
type?
Well, if you have an XMI file that contains a DKPro Core DocumentMetaData
annotation, then you'd have to copy that type definition over to the modified type system. You could to this manually or programmatically. Cf. e.g. https://cassis.readthedocs.io/en/latest/_modules/cassis/typesystem.html#merge_typesystems
When exporting the UIMA type system, INCEpTION only exports the types related to the layers defined in the project. If we did not do that, the type system would be spammed by tons of DKPro Core types (such as the one you are missing) and the type system file would be considerably larger. INCEpTION does not use most of the DKPro Core types though.
We could introduce a second export option to export a full UIMA type system that includes all types INCEpTION knows about - even the ones that it does't use.
Where did you get the type system from step 6 from (i.e. the one that contains the DKPro Core DocumentMetaData type?
I just exported this from the document view. If you press that, you can choose the format, and if you select CAS you get a TypeSystem.xml and the actual file.
Well, if you have an XMI file that contains a DKPro Core DocumentMetaData annotation, then you'd have to copy that type definition over to the modified type system. You could to this manually or programmatically. Cf. e.g.
Yes, this would definitely be an option, and it would happen after we have downloaded the annotated documents. I was just wondering why the other TypeSystem did not have all the types.
When exporting the UIMA type system, INCEpTION only exports the types related to the layers defined in the project. If we did not do that, the type system would be spammed by tons of DKPro Core types (such as the one you are missing) and the type system file would be considerably larger. INCEpTION does not use most of the DKPro Core types though.
We could introduce a second export option to export a full UIMA type system that includes all types INCEpTION knows about - even the ones that it does't use.
The question is: Is the "de.tudarmstadt.ukp.dkpro.core.api.metadata.type.DocumentMetaData" always added when exporting a document?
It seems we have an inconsistency in the implementation here. The type system export from the layer settings only includes the layers defined in the project settings. However, when exporting via the functionality to export individual documents or even the entire project, all types that INCEpTION has access to are included - even if they are not defined as project layers.
The question is: Is the "de.tudarmstadt.ukp.dkpro.core.api.metadata.type.DocumentMetaData" always added when exporting a document?
Ok, so finally coming back to this.
Yes, INCEpTION always adds de.tudarmstadt.ukp.dkpro.core.api.metadata.type.DocumentMetaData
when exporting a document. Even if it already exists, it is overwritten by INCEpTION, e.g. setting the documentId
to the document's filename.
However, this is not documented atm and it might change in the future.
I believe you should have no issue if you just copy the definition of the de.tudarmstadt.ukp.dkpro.core.api.metadata.type.DocumentMetaData
over to the type system you are using to load the data in cassis. Alternatively, you could load the CAS leniently using DKPro Cassis - that would drop any annotations not in the target CAS type system.
Describe the bug
Hi there,
as described #3605, we are trying to export the INCEpTION TypeSystem such that it allows UIMA subtypes, which would allow us to postprocess the data more easily. This is currently not possible, but we tried a workaround.
To Reproduce
We did the following:
ts_old = cassis.load_typesystem(Path("ModTypeSystem.xml")) print("Does not work for the original modified typesystem, which has subtypes") c = cassis.load_cas_from_xmi(downloaded_location / "text.xmi", ts_old)
Works for newly created typesystem, but does not have subtypes <cassis.cas.Cas object at 0x10c46cdf0> [] Does not work for the original modified typesystem, which has subtypes Traceback (most recent call last): ... cassis.typesystem.TypeNotFoundError: Type with name [de.tudarmstadt.ukp.dkpro.core.api.metadata.type.DocumentMetaData] not found!