inventaire / inventaire

a libre collaborative resource mapper powered by open-knowledge, starting with books! :books:
https://inventaire.io
437 stars 28 forks source link

Display remaining tasks overview #724

Open jum-s opened 9 months ago

jum-s commented 9 months ago

In today's codebase:

The objective is to have users access a dashboard of main tasks to do, aka grouping tasks by user interests (categories)

Proposed dashboard categories: [edited to integrate max comment below]

Proposed implementation:

maxlath commented 8 months ago

What do you think of unifying user generated and robot generated merge tasks? The current human deduplication process would just be one provider among others of merge tasks, specialized on humans, but the rest would be entity-type agnostic and reporter (user or bot) agnostic(?). In that direction, and working from memories, I think it might make sense to make that author deduplication process create less tasks: it automerges what it can and creates tasks when it's not quite sure, but doesn't create a task for every homonym returned by Elasticsearch(?) as that information is of lower quality than if a user reports that A and B should be merged.

jum-s commented 8 months ago

I had in mind a hard split (different category) to give priority to reporter tasks, as a real human is in pain seeing a mismatch somewhere. But yes, it could be a softer way, ie. to sort human tasks by reporter first, then the others.

Reducing the amount of autogenerated tasks seems like a good idea. The easiest would be to introduce a hardcoded threshold on score (dont create task if score is lower than 100)

jum-s commented 8 months ago

Here is a query to find the 10000th task sorted by descending score:

curljson "http://[couchdb]/tasks-prod/_design/tasks/_view/byScore" | jq ".rows[]|sort_by(.key[2])|reverse|.[10000]"

Rough idea of the results: 1 000th task score: 674.73 10 000th task score: 395.09 15 000th task score: 352.46 25 000th task score: 299.79 100 000th task score: 182.11

This could allow us to set a threshold of 350, and still create ~15k tasks (against ~700k today). Threshold could be a config setting to allow to recreate tasks without having to push a commit.

jum-s commented 5 months ago

Following recent discussion with max: