wetneb / openrefine-wikibase

This repository has migrated to:
https://gitlab.com/nfdi4culture/ta1-data-enrichment/openrefine-wikibase
Other
100 stars 24 forks source link

💡 Collective / collaborative dump with manual reconciliation moves? #117

Closed silviaegt closed 1 year ago

silviaegt commented 3 years ago

Hey @wetneb,

First of all, thank you so much for this life-changing endpoint!

I use it so much I thought maybe, just maybe something like this could be useful:

Have you thought about creating a collective repository where GLAM institutions and others can dump their operation history (i.e. this exercise in which I matched US Universities) and train a ML algorithm or the like to get better matching results based on the probability that a certain string matches with a given QID in certain data contexts?

I think it would be nice to build collectively with what has taken so many sitting hours....

thadguidry commented 3 years ago

Hi @silviaegt training a ML algorithm requires data. So, institutions would additionally need to provide some or all of the reconciled candidate data (an opt-in choice for users) and not only the operation history of OpenRefine.

However, we have talked about having this potential future "opt-in" feature where selected reconciliation candidate data could be sent back to a service and be used in feedback loops by anyone interested in working on problems of improving reconciliation. We are discussing such use cases in the Reconciliation API specs here: https://github.com/reconciliation-api/specs/issues/30 So, its a 2 part story, 1. where reconciliation services ask for usage of reconciliation data from a user, 2. where the user gives their consent to the service and tools like OpenRefine would send the reconciled candidate data and other columns data that was used to make a decision by the user. With that extra data and information, then classification systems (lazy learners, eager learners) could begin to fit for certain models for various ML algorithms.

silviaegt commented 3 years ago

Hey @thadguidry,

so glad to hear this is already something you're thinking about!

No doubt full reconciled candidate data would be great, I was just thinking that perhaps having tons of files with reconciliation operations like "Rutgers University, New Brunswick" was manually matched to "Rutgers University–New Brunswick" or "Southern Illinois University at Carbondale" was matched to "Southern Illinois University Carbondale" could be useful.

I know lots of librarians have their operation history in their OpenRefine projects and an open call to collect those might not be that difficult? But hey, the important thing is that it is relevant for you guys 😄

wetneb commented 3 years ago

Hi @silviaegt,

Yes I totally agree with you that it would make sense to leverage more the annotation effort. Making it easier to mutualize it would be very useful. Using the OpenRefine operation history could be one way, but I think it should be possible to come up with something more practical. I don't have tons of time to expand on this but it's on my radar :)

wetneb commented 1 year ago

It's a nice idea but it's not within scope for this service - it would be more on OpenRefine's side.