inveniosoftware / invenio-records-resources

Records REST APIs for Invenio.
https://invenio-records-resources.readthedocs.io
MIT License
4 stars 49 forks source link

API response returns HTML encoding in description field #244

Open andrewbattista opened 3 years ago

andrewbattista commented 3 years ago

API response returns HTML encoding for description field

When users expose the JSON record on the item show page, or when users expose the record via the api/records/ call via the URL, the response injects HTML markup into the response of the "description" element. This should not happen.

Expected behavior

The API should return a JSON plaintext response that does not inject HTML markup into the response, especially when none existed previously

Example

here is an output of a sample record

        }
      }
    ],
    "description": "<p>This dataset collection contains two multi-band raster data layers, which represent estimates of carbon opportunity cost of animal-sourced food production on land. The data is captured at a resolution of 5 arcminutes over the global domain and is derived from data collected approximately over the past two decades (2000-2020). The pixel values measure estimates in tonnes of potential vegetation per hectare that are suppressed by pasturelands and present-day feed crops. In the carbon opportunity cost layer, the bands represent three estimates of carbon in potential vegetation: median, low (5th percentile), and high (95th percentile). In the animal hectacres layer, the pixel band ranges represent two estimated values: areas sourced from the lowest carbon areas and areas sourced from the highest carbon areas. This data is released with an Attribution 4.0 International (CC BY 4.0) license. Users may cite this collection with https://doi.org/10.17609/q5pe-7r68/. Refer to the external for geospatial data preview and download.</p>",
    "publication_date": "2021-05-12",
lnielsen commented 2 years ago

Hi @andrewbattista, I'm just seeing this one now.

1) Did you use the deposit form for creating the record? If so, it's the JavaScript WYSIWYG editor that injects the <p> tags.

The description field is a rich text field so that users can use bullets, bold etc for the description of their record. The field is sanitised on storage (via whitelisting) to avoid XSS injections. The API delivers the full description so that a frontend API can also correctly render the description with bolds, italics etc.

It's possible that we can create another JSON format, that on output strips all tags from the description field. We do that e.g. in the application/vnd.inveniordm.v1+json format.

andrewbattista commented 2 years ago

@lnielsen - Yes, I did use the deposit form, and yes, it would be great to create a format that strips all tags. However, I think this is an issue that may have been fixed with the version 4 release (or maybe version 5) because it's not injecting those tags anymore. But if not, better to leave this on the radar