Open jzacsh opened 3 years ago
Many subject tags are imported with the record and not in our direct control. There are over 2,000 items with this tag alone. This is too many to fix manually. Maybe someone would like to try to remove this programmatically?
@seabelis how about proposal 1a, or 2: do those sound possible to you?
While I'm not available to work on this any time soon, I'd guess the fastest help to that next contributor: pointers to where solutions 1,1a, 2 might plausibly start off in this codebase, or which proposals the team prefers/dislikes.
Maybe someone would like to try to remove this programmatically?
Edit: Also I should point out that proposal 1 doesn't have to be a mass removal (in fact that might leave the buggy search experience still intact for many books), but could be a re-insertion of the intended/fixed values.
Oh interestingly: #7904 seems to be a newer (2years later) rethink of the data structure involved here. I'd guess it's important that whatever is proposed here should be coordinated closely with those folks.
This subject was imported from Better World Books which is infamous for providing garbage metadata, but we've been unable to convince the powers that be to stop importing from it. Obviously having subjects with embedded commas is incompatible with using commas as the delimiter in the subject data entry field, so they would need to be escaped in some way, but it is likely that it was originally intended to be the hierarchical genre "Fiction / Science Fiction / Space Opera" as you can see from the Library of Congress hierarchy here: https://id.loc.gov/authorities/genreForms/gf2014026551.html You can also see it in textual form rather than "broader" links at the bottom of this MARC record: https://openlibrary.org/show-records/marc_loc_2016/BooksAll.2016.part41.utf8:166410102:1707
You can see all the different ways that "space opera" is spelled on OpenLibrary with different hierarchy delimiters here: https://openlibrary.org/search/subjects?q=space+opera
My feature request (#2819) to make subjects first class objects instead of strings was an attempt to bring some order to this as well as allow links to things like LCSH, FAST, and Wikidata. It would also support internationalization for things like Novelas del espacio
The best fix would be to stop importing from BWB, but failing that all the bad metadata should be filtered out (which is probably effectively the same thing).
470~ works incorrectly listed in a nonsense genre
Fiction, science fiction, space opera
(those are three different genres) - see here: https://openlibrary.org/subjects/fiction_science_fiction_space_operaEvidence / Screenshot (if possible)
Relevant url?
https://openlibrary.org/subjects/fiction_science_fiction_space_opera
Steps to Reproduce
science fiction [14,712 books]
icon in the big "browse by subject" bannersubject_facet
in GET params)Subject keywords?
form-field (instruction saysPlease separate with commas. For example: cheese, Roman Empire, psychology
)"Fiction, science fiction, general"
note: this is the same for Consider Phlebas but as you can see it's overcome by a correct genre being added in beside the nonsense genre (so the book shows up facet searches as expected from the home page).
Expected/Actual
root cause of the bug is step 4f being wrong (and this is true of many books; see 4g):
"Fiction, science fiction, space opera"
Fiction, science fiction, space opera
(or maybe"Fiction", "science fiction", "space opera"
- the point here is not to quote the entire string)Details
Proposal & Constraints
Fiction
,science fiction
,space opera
)