internetarchive / openlibrary

One webpage for every book ever published!
https://openlibrary.org
GNU Affero General Public License v3.0
5.2k stars 1.36k forks source link

Editing/adding a field causes unchanged /type/text fields to convert to /type/string #610

Open hornc opened 7 years ago

hornc commented 7 years ago

Editing/adding a field on a work or edition causes un-changed /type/text fields (e.g. description, notes, first_sentence) to convert to /type/string

Actively adding or editing a /type/text field causes it to be saved correctly as /type/text

Author bios may be affected in the same way as they are also /type/text

This means we have inconsistent description formats in the data:

grep -c '"description": "' ol_dump_editions_2017-09-30.txt
21,112
grep -c '"description": {' ol_dump_editions_2017-09-30.txt
742,868
hornc commented 7 years ago

@mekarpeles Here is code using the openlibrary-client to reproduce the issue at the lowest level I could, using the API. Hopefully it illustrates the strangeness. Originally I thought there was a bug in the web form, but it the form seems to rely on passing the description as a simple string and having it consistently convert to the correct /type/text on save, which does happen sometimes, but not if other fields are changed.

I can't figure out where the conversion from a string description to /type/text object occurs, I think there is something about Infogami I am missing. The conversion is either happening in infogami somehow, or there is some part of the infogmai integration in OL that is doing it... any pointers to where I should be looking?

# Code to reproduce https://github.com/internetarchive/openlibrary/issues/610
# Editing/adding a field causes unchanged /type/text fields to convert to /type/string
# 
# Uses openlibrary-client to reproduce the issue on a local dev instance of Open Library

from olclient.openlibrary import OpenLibrary
ol = OpenLibrary(base_url="http://localhost:8080")

# 1) Save a simple string to an edition
edition = ol.get("OL4M")
edition.description = "This is a string"
edition.save("send a simple string for description")

# 2) re-load the edition and check the description format
#   it has been converted to a dict with 'type' and 'value', this is good, but 'magic'
check_edition = ol.get("OL4M")
assert(isinstance(check_edition.description, dict))
assert("type" in check_edition.description)
assert("value" in check_edition.description)
assert(check_edition.description['type'] == "/type/text")
assert(check_edition.description['value'] == "This is a string")

# 3) save description again as in 1, but also modify another property
check_edition.description = "This is a string"
# need to change an existing property to exhibit the bug behaviour, add 1 to publish_date
check_edition.publish_date = str(int(check_edition.publish_date) + 1)
check_edition.save("send simple string for description and another change")

# 4) re-check
# description is now a plain string, not a dict with type and value.... ???
recheck_edition = ol.get("OL4M")
assert(isinstance(recheck_edition.description, basestring))
assert(recheck_edition.description == "This is a string")
xayhewalo commented 5 years ago

@hornc Did you determine if the bug is occurring in Infogami or OL'is integration of Infogami?

psyphernetik commented 4 years ago

I would like to know if this is still an issue? if not then I request that the issue is closed.

hornc commented 2 months ago

This is still an issue in 2024, the code above to reproduce using the OL client no longer works.

In a local dev environment OL4M.json has a notes field that is /type/text ,

image

editing the record to add, for example, a subject, will convert the field to image

Currently new books are imported with notes fields as /type/text, example from minutes ago: https://openlibrary.org/books/OL53160052M.json but on unrelated user edits they will be converted to plain strings. This presumably impacts our bulk export data format consistency. I'm not sure how the UI copes with both forms.