internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.08k stars 1.32k forks source link

Greek and Modern Greek are both included on edit form. #8145

Closed seabelis closed 2 weeks ago

seabelis commented 1 year ago

Both Greek and Modern Greek are languages on the edition edit form. I do not see there is a different language code for just "Greek." This may be an error. Additionally, many titles that were probably imported as Ancient Greek, have Modern Greek identified as the language. There may have been some import issues early on that mixed up the codes.

Evidence / Screenshot (if possible)

Screenshot 2023-07-31 at 12 19 25 Screenshot 2023-07-31 at 12 19 17

Relevant url?

Steps to Reproduce

  1. Go to ...edition edit form.
  2. Do ... check for greek and modern greek.

Details

Proposal & Constraints

Related files

Stakeholders

tfmorris commented 1 year ago

This is likely because https://openlibrary.org/languages/gre.yml has both a name of "Greek" and a name_translated.en of "Modern Greek". If I'm right, choosing either will result in a language code of gre.

8138 is related. There are no non-English labels for "Ancient Greek" available at all. When those are added, they are going to use normal name order, as is the custom on Wikidata. I personally think that is the right way to do it and think the English label should also use normal name order as well, as it has since 2009 (which, of course, means that the autocomplete widget can't be restricted to prefix searches as it currently appears to be).

Additionally, many titles that were probably imported as Ancient Greek, have Modern Greek identified as the language. There may have been some import issues early on that mixed up the codes.

Do you have some examples of these import errors? I'd be happy to investigate. The first potential example I checked was imported from a Scriblio MARC record (mis)coded as gre and had grc added later by "MARC Bot" (without any provenance) which is the opposite of the problem you're describing. https://openlibrary.org/books/OL6984687M/Magni_Hippocratis_Coi_opera_omnia.?b=8&a=7&_compare=Comparer&m=diff

seabelis commented 1 year ago

Thanks, @tfmorris . I'll post the next example I run across.

I do not object to listing Ancient Greek in normal order, but as you mention, the auto-complete should then be able to find it as an option when someone types "greek".

tfmorris commented 1 year ago

@seabelis I've created #8146 to cover fixing this.

seabelis commented 1 year ago

@tfmorris https://openlibrary.org/books/OL24990264M/Republic_of_Plato?v=1 Created from an import from archive.org where the book is cataloged as just Greek.

tfmorris commented 1 year ago

@seabelis Thanks for the example. Sorry to say that there's not enough provenance to figure out what code imported that record. I had a look at the import code as it existed around that date and didn't see anything obvious which would have caused the problem. I can only assume that something intentionally remapped gre to grc as part of the import.

This is another thing which would presumably get fixed if a run was doing re-importing all MARC records with the current improved MARC parser and import pipeline.

cdrini commented 2 weeks ago

Fixed by #8160 . image