brierjon commented 1 year ago

Content needing attention that can be used to coordinate with others once a data quality issue is determined and either not clear to the editor how to fix or is pervasive and needs a larger audience or tool to address.

Currently the way to flagging of content works at least as it appears from my experience if for librarians and the community it relies upon out of band communication. ie email as "editing issue", posting to gitter, slack, etc, but the content itself cannot be flagged for specific type of data review to indicate to other editors that there is a known data issue needing eyes or attention to correct.

This would afford API tools to flag content that the editors could be made aware of to check if it is not safe to do automated cleanup.

Examples in other open editable systems:

OpenStreetMap - content edited has the "needs additional eyes" flag to ask for checks if edits were correctly interpreted
OpenFoodFacts has a task specific interface called hunger games which queues tasks for human review based on automated detections - https://hunger.openfoodfacts.org/, but on the main site where a matching issue is present a banner is displayed for suggested edit for review based on OCR or other automated suggested edit which a human is asked to verify to apply.
Wikipedia has their banner templates which allow users or bots to flag content for action ie discuss move

To enhance the collaborative editing it could enhance the human - machine collaboration by allowing tools for detecting as well as humans to flag content for attention and specify what type of attention is needed and display this both on the content template and as aggregated reports. These would 1. Help display the types of issues needing to be addressed 2. Visualize and measure the scope of the issues identified. 3. Encourage people to flag issues even if they don't know how to fix them for possible new types of errors to surface and make accessible to the community to address 3. Encourage people to develop processes and tools to tackle the current backlog on the issues as well as collaborate on the common issues.

Describe the problem that you'd like solved

Create a clearinghouse of the existing known types of issues in OpenLibrary
Enhance visibility to the scope of the known types of issues in OpenLibrary
Facilitate the evaluation of gap in data issues and processes and tools to address the issues
Facilitate bot and data quality tools collaboration with human editors

Proposal & Constraints

Add to the edit field a "needs review" field to Author, Works, Edition templates with a defined list of "issue classifications and subclassifications" to select (one or more should be selectable and the date first selected should be saved). These are a few examples, but many more could probably be created. Not all need to be in the UI, some could be reserved for automated tagging only.

Data issues:

Linking issue (check if viaf, isni, wikidata, etc are valid, ie integrity checks where values change or are merged)
Flagged for issue X (X being defined tags based on identified list ie possible name not in natural order,
Works without authors
Editions without works
Possible series in name
likely content errors

Data additions:

Possible timeframe in title or subtitle (flag by bot indicating human review needed to verify if the information suggested to be added is accurate or false positive)
OCR Check -- OCR page identification -- Contributor from OCR w/link to scanned page identification -- Table of contents OCR w/link to scanned page to table of contents -- Copyright from OCR w/link to scanned page

Edit checks:

Check my edit
flag possible spam

Add to resolving comments a new "review section" with checkboxes for flagging changes "need review". Add to resolving comments a list of the errors flagged (generated only for flagged issues present" and a checkbox for each of "valid and fixed" or "false positive" which would provide feedback to the person or tool flagging. For items not checked, the issue flag would remain.

Additional context

Flagging could be expanded to review content by a specific user account over period of time x to x (I'm mostly thinking of reviewing bots if import issues are identified or other automated editing needing to be revisited, but this could apply to user accounts found to have added bad data through diff review over the edit history correlated to flags of the content after their edit) This may be needing to flag specific changesets. Wikipedia employes review of edits

Viewing of the flags could be opt-in as not to overwhelm newcomers or could be throttled depending on the type, frequency displayed, or slowly revealed to users after they have repeat visits to OpenLibrary and have oriented as a means to prevent new account spam or misunderstandings.

Stakeholders

mekarpeles commented 1 year ago

Blocked by #7416

loleg commented 3 months ago

7416 has been closed. I've come across a number of titles & edits that need moderation. +1 for this issue.

jimchamp commented 3 months ago

No longer blocked by #7416. I don't believe that it would be difficult to add the ability to flag records today. We could update our librarian request queue to display a new FLAG request type. A "Flag" button can be added to the books page, that, when pressed, prompts the patron for a reason and adds a new entry in the librarian request queue. Then, whenever a librarian is available, they can do any necessary moderation and remove the flag.

Marked this as https://github.com/internetarchive/openlibrary/labels/Needs%3A%20Designs, as some mock-ups of the "Flag" button will be needed.

internetarchive / openlibrary

flags and UI indicator for content needing human attention and/or data quality review #7627

Describe the problem that you'd like solved

Proposal & Constraints

Additional context

Stakeholders

7416 has been closed. I've come across a number of titles & edits that need moderation. +1 for this issue.