internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.09k stars 1.33k forks source link

Imports via `load()` appear to accept only one language #9490

Open scottbarnes opened 2 months ago

scottbarnes commented 2 months ago

Problem

Imports via load() that specify multiple languages seem to only accept one language. It is unclear if this only applies to the /api/import/ia endpoint.

It should be tested with other endpoints, such as /api/import. See the Developer's Guide to Data Importing for the many fun ways to import via the various endpoints with various languages and tools.

Evidence / Screenshot

Consider the IA item 2e013a64-935f-4be4-9c51-3eac22929627, which has two languages.

Importing it, with #9480 applied (and #9480 MUST be applied or already in master), works:

❯ curl -X POST "http://localhost:8080/api/import/ia" \                                                                 
     -b ~/cookies.txt \                                                                                                
     -d "identifier=2e013a64-935f-4be4-9c51-3eac22929627&require_marc=false"                                           
{"authors": [{"key": "/authors/OL12A", "name": "Ingrid Robeyns", "status": "created"}], "success": true, "edition": {"k
ey": "/books/OL22M", "status": "created"}, "work": {"key": "/works/OL8W", "status": "created"}}

However, once imported, it only has English as a language: Spanish has disappeared into the ether.

Reproducing the bug

  1. Try to import something via /api/import/ia that has multiple languages, such as ocaid 2e013a64-935f-4be4-9c51-3eac22929627.

Context

Notes from this Issue's Lead

Proposal & constraints

Related files

This will take more research, but by the time the rec gets to build_query in openlibrary/catalog/add_book/load_book.py, the only language code is ENG: https://github.com/internetarchive/openlibrary/blob/c546c6ab62c6435796109ca6a8169a673b842c75/openlibrary/catalog/add_book/load_book.py#L288-L324

You'll have to trust me, but this is from a print(f"{rec = }", flush=True) called just after the docstring in build_query:

rec = {'title': 'Tener Demasiado', 'authors': [{'name': 'Ingrid Robeyns'}], 'publish_date': '2024-02-19', 'description': "'Tener demasiado' es el primer volumen académico dedicado al limitarismo: la idea de que el uso de los recursos económicos o de los ecosistemas 
no sobrepasen ciertos límites. \n\nSe trata de un concepto profundamente arraigado en el pensamiento económico y político, por lo que es posible encontrar premisas similares en pensadores como Platón, Aquino o Spinoza. No obstante, 'Tener demasiado' es el primer ejemplar en el campo 
de la filosofía política contemporánea en el que el limitarismo se explora en profundidad y con detalle.\n\nAsimismo, este estudio reúne por primera vez los mejores escritos de los principales teóricos del limitarismo, lo que le convierte en una contribución esencial al campo de la f
ilosofía política, en general, y de las teorías sobre la justicia distributiva, en particular. Incluye tanto artículos seminales ya publicados como nuevos capítulos y se presenta como lectura indispensable para académicos y estudiantes de teoría política y filosofía, así como para to
dos aquellos interesados en cuestiones relacionadas con la justicia distributiva.", 'isbn_13': ['9781805110804', '9781805110811', '9781805110828', '9781805110866', '9781805110835'], 'languages': ['ENG'], 'subjects': ['HP', 'HPCF', 'HPS', 'KCA', 'RNA', 'PHI000000', 'PHI019000', 'PHI03
4000', 'POL023000', 'POL044000', 'KCP', 'QD', 'QDTS', 'RND', 'Economics, Politics and Sociology', 'Other languages', 'Philosophy', 'Generaciones futuras', 'Justicia distributiva', 'Justicia intergeneracional', 'Limitarismo', 'Limitarismo ecológico', 'Limitarismo económico', 'Recursos
 materiales'], 'oclc': ['1422929642'], 'number_of_pages': 458, 'publishers': ['Open Book Publishers'], 'ocaid': '2e013a64-935f-4be4-9c51-3eac22929627', 'source_records': ['ia:2e013a64-935f-4be4-9c51-3eac22929627'], 'subtitle': 'Ensayos Filosóficos sobre el Limitarismo'}

Stakeholders


Instructions for Contributors

Naresh-kumar-Thodupunoori commented 1 month ago

@mekarpeles May I work on this!!

scottbarnes commented 1 month ago

I've assigned this to you, @Naresh-kumar-Thodupunoori. Please ask any questions you may have, and when you submit your PR, please include before and and after input and output so it's easy to verify, as a threshold matter, that the PR resolves the issue.

hornc commented 4 weeks ago

FWIW, a recent bulk MARC import has imported an item with multiple language codes: https://openlibrary.org/books/OL53206387M/%E6%B3%95%E6%80%9D%E6%83%B3

it was imported using the /api/import/ia endpoint

Naresh-kumar-Thodupunoori commented 4 weeks ago

Hey @scottbarnes I am unlikely not able to finish the issue. So I unassigned the issue.

scottbarnes commented 4 weeks ago

Thank you for the heads up, @Naresh-kumar-Thodupunoori, and thank you for the tip about a book being imported via /api/import/ia that includes multiple languages, @hornc.

For anyone looking to work on this, it may be the case that bulk_marc imports are different from the one-off import via an 'individual' OCAID/Internet Archive identifier.