inception-project / inception

INCEpTION provides a semantic annotation platform offering intelligent annotation assistance and knowledge management.
https://inception-project.github.io
Apache License 2.0
593 stars 151 forks source link

Agreement score per document #2827

Closed n0k0m3 closed 2 years ago

n0k0m3 commented 2 years ago

Is your feature request related to a problem? Please describe. I want to have an option to select a curated/finished document when computing agreement score (cohen kappa for example)

Describe the solution you'd like A file selector similar to the file manager after opening "Annotation", then compute the score based on that document alone

Describe alternatives you've considered Export the whole project raw XMI and code a python script to do analysis (as suggested in export guide)

reckart commented 2 years ago

Thank you for the suggestions.

If you go towards implementing something in Python, I would suggest having a look at INCEpTION Analytics. This is a Python-based project that uses the StreamLit framework to navigate and present the data contained in exported INCEpTION projects. It contains code for accessing the data from within the project export ZIPs and it uses DKPro Cassis for reading the XMI files within. It at a basic but workable state and I assume the maintainers would welcome contributions (@zesch).

n0k0m3 commented 2 years ago

That's a great API for export analysis and actually fits my need. Is there any plan to implement this in the main INCEpTION webapp?

reckart commented 2 years ago

There are tons of different analytics features that could be implemented, but only so many coders ;) That's why we focus a bit more on the ability for people to do the stuff they need outside the main app. But it could still be something to be added to the main app.

It actually shouldn't be hard to do in it in the main application - all the stuff to do the calculations is there - it's just to limit it to a single document. But whoever would be implementing it would need to be a developer with some Java skills. INCEpTION Analytics has the benefit of being a Python app and that allows Python coders to contribute.

reckart commented 2 years ago

What is your motivation for wanting to see agreement only for a single document as for example opposed to being able to see agreement for a custom selection of documents (i.e. single-select vs. multi-select)?

n0k0m3 commented 2 years ago

When assigning the documents I split the 10 annotators to 2 anno/doc and then randomly choose 2-5 docs with all 10 annotators to evaluate their performance/agreement score. This gives me preference of which annotators are more "trustworthy" than others when curating. This is more or less aligned with my feature request in #2828

Also, regarding single-select/multi-select, I don't really have a preference on that, and actually multi-select would be more preferrable

reckart commented 2 years ago

While it is not particularly user-friendly, I believe it should be possible for such a small case to do this:

I haven't tested it, but I believe it should work - if it doesn't work, its probably a bug.

n0k0m3 commented 2 years ago

Oh I just found out I can use regex to do this with monitoring, it worked. Not ideal, but usable. Scaling this for 100+ docs with (10-20 "chosen docs") would be hard though so I might need to figure out a regex builder + selenium script for this.

reckart commented 2 years ago

Well, yes - when wanting to do this at some scale, then the suggested approach is less then ideal. If you consider applying Selenium to this, I assume your skill set includes JavaScript and Python but not Java, right?

But for 100+ documents, your feature request of being able to select a single document for agreement wouldn't be adequate either, would it? You'd have to select each of the 100 documents one by one and calculate agreement then, no?

n0k0m3 commented 2 years ago

I assume your skill set includes JavaScript and Python but not Java, right?

Just python, I use selenium python bindings.

You'd have to select each of the 100 documents one by one and calculate agreement then, no?

Not quite, I only need to do this for 10-20 docs even if the scale is larger. Tbh inception-analytics would serves me well with my list of chosen text and a python script, but this would be a hassle to train others to use it when I want to delegate the manager role to other (same with the regex + selenium approach, but a little bit better as user will stay on INCEpTION rather a CLI/another webpage)

reckart commented 2 years ago

INCEpTION has a remote API through which projects can be exported. Currently, INCEpTION Analytics requires you to provide an exported ZIP before it can be used. I imagine it could be extended with little work to connect to a running INCEpTION using the remote API and then export the project from there and load it for analysis. Not considering role-based access control, this could be an interesting approach.

n0k0m3 commented 2 years ago

Will take a further look into this, thanks for all suggestions so far. If you don't mind I want to keep this open for further discussion if needed

simulacrum6 commented 2 years ago

Hey there, I added an issue on the inception analytics repository (ltl-ude/inception-analytics#6). I will probably not be able to integrate it before the end of February, though. If you need a quicker solution, you could also export the project zip using the remote api yourself with the pycaprio client and import it to INCEpTION analytics in this way. The documentation describes how to do this.

reckart commented 2 years ago

@simulacrum6 Cool, thanks!

simulacrum6 commented 2 years ago

Downloading a project using the remote api works in the latest release. You can install inceptalytics using pip as well (pip install inceptalytics). I hope this works for you. If not, feel free to open an issue at the inceptalytics repository.

n0k0m3 commented 2 years ago

Very nice, it worked for me before the latest release (manually download project and import), but it's even better now. I'll close this issue for now. Also artifacts on PyPI is definitely a huge plus