inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
600 stars 153 forks source link

Include project id in zipped export #5156

Open jwijffels opened 2 weeks ago

jwijffels commented 2 weeks ago

Is your feature request related to a problem? Please describe. I'd like to have the project id next to the project name when exporting the project as UIMA CAS XMI (XML 1.0)

Describe the solution you'd like When exporting the project as a zipped file, currently the project id is not available anywhere.

image

It seems only available in the cas metadata (namely the baseuri where it looks like repository/project/6/document/828/source) when a document is annotated, so not when there are no documents annotated whatsoever.

baseuri = cas.select('de.tudarmstadt.ukp.dkpro.core.api.metadata.type.DocumentMetaData')[0].documentBaseUri

Describe alternatives you've considered I could get it through the python client, where you first get a list of projects and for a specific project, get a zipped download, but I'd rather also have that the frontend returns the project id in the zip file

Additional context Example screenshot of the json of the project export showing that only the project name is put in the json, not the project id

image
reckart commented 2 weeks ago

The project ID has no meaning for the exported project because when you import the project again, it gets assigned a new ID. Why do you need it?

jwijffels commented 2 weeks ago

That's indeed what I thought the reason was why it's not included.

The context is, I'm using Inception to collect training data for building a NER model. I have flows using pycaprio which collect the data (by project id) and would like to have the same structure outputted if a user decides to export the data as a zipped project before I plug them into the NER modelling process.

reckart commented 2 weeks ago

Have you considered using the project slug instead of the ID? It is at least a bit more stable. The slug would only change on import if there is already another project with the same slug.

jwijffels commented 2 weeks ago

That's indeed an option. Currently I parse out the project_id based on this documentBaseUri repository/project/6/document/828/source but then I have the project id only in case there are annotations or curations. I could use the project slug when exporting as zip as an alternative to the name, but in pycaprio, the client.api.projects only returns the project name and project id, not the slug.

reckart commented 2 weeks ago

I could use the project slug when exporting as zip as an alternative to the name, but in pycaprio, the client.api.projects only returns the project name and project id, not the slug.

Right... I believe it's a pycaprio thing though and that INCEpTION already returns the slug in the API response (as field name). Fortunately, pycaprio is now maintained here and we can do new releases of it. Would you like to look into adding access to the name field as slug to pycaprio?

https://github.com/inception-project/pycaprio

jwijffels commented 2 weeks ago

Ah, so the name element in client.api.projects in pycaprio is the slug? Good, I'll dig into pycaprio to see what the API response really returns.

reckart commented 2 weeks ago

This is the data transfer object on the Java side where you can see how the project information is mapped to the JSON response:

https://github.com/inception-project/inception/blob/main/inception/inception-remote/src/main/java/de/tudarmstadt/ukp/clarin/webanno/webapp/remoteapi/aero/model/RProject.java

jwijffels commented 2 weeks ago

Ok, clear. thanks. I'll look at the pycaprio side to incorporate the slug there and restructure my code to work with the slug.

reckart commented 2 weeks ago

I just noted that the slug in the exported project is always null - will be fixed in next bug fix release.