sul-dlss / cocina-models

Cocina repository data model (implemented in Ruby)
https://sul-dlss.github.io/cocina-models/
3 stars 0 forks source link

Map literal <i> and <cite> elements in note fields to be wrapped in CDATA #482

Closed mjgiarlo closed 2 years ago

mjgiarlo commented 2 years ago

The Martin Wong collection (among others) requires that <i></i> and <cite></cite> HTML elements are not stripped out and are mapped to be wrapped in CDATA, particularly within note fields. The current behavior escapes these elements, e.g., &lt;i&gt; (which does work in PURL but produces invalid MODS and is not the specified approach).

To make testing easier, we've already created a CSV that is a subset of actual Martin Wong data:

Martin_Wong_sample.csv

You can test that here: https://argo-stage.stanford.edu/apos/druid:bc036dg9936/bulk_jobs (choose Spreadsheet input; load into objects after selecting said CSV file.)

Here is the test object in Argo: https://argo-stage.stanford.edu/view/druid:hb400xn4539 And here's its PURL page to ensure italics (around Paiting is Forbidden in the exhibition history note) are present: https://sul-purl-stage.stanford.edu/hb400xn4539

In addition to PURL, we should also ensure that the italics make it through to Exhibits (both on the item show page and in the "More details" view). If you need to consult the Access team for help with either or both of these, they will have time for this between 6/6 and 6/24.

Connects to sul-dlss/exhibits#1981

jcoyne commented 2 years ago

@mjgiarlo is there anything left to do on this?

mjgiarlo commented 2 years ago

@jcoyne What's left on italics is I think to put this code in place and verify it has the desired effect on the Access side.

mjgiarlo commented 2 years ago

@jcoyne Even though all our apps are using cocina-models 0.82.0, it appears as though i/cite tags are not being wrapped in CDATA. Here's the object I've been testing with: https://argo-stage.stanford.edu/view/druid:kd791zq6661

Here's the upload spreadsheet I've been tweaking: descriptive-kd791zq6661.csv

If you go to the Cocina JSON view in Argo, the angle brackets look like they are being unicode-escaped which may be why this cocina-models note mapping is being defeated? https://argo-stage.stanford.edu/items/druid:kd791zq6661.json I'm not sure where the unicode escaping is happening, and when I load the Dro in the Rails console, I don't see it. So it looks like there's a bit more work to do on this. Ideas are welcome!

andrewjbtw commented 2 years ago

The cocina version on kd791zq6661 is showing as 0.80.0 not 0.82.0

andrewjbtw commented 2 years ago

Although in the public XML it's showing as 0.82.0

mjgiarlo commented 2 years ago

Closed by https://github.com/sul-dlss/cocina-models/pull/481