nextcloud / cookbook

🍲 A library for all your recipes
https://apps.nextcloud.com/apps/cookbook
GNU Affero General Public License v3.0
536 stars 93 forks source link

HTML Character Entities Do Not Display Correctly #466

Open paulcalabro opened 3 years ago

paulcalabro commented 3 years ago

Description HTML character entities are not properly displayed within the recipe.

Reproduction Steps to reproduce the behavior:

  1. Import the following recipe: https://www.culinaryhill.com/mojito-mocktail-recipe/
  2. Look at the description.
  3. Notice it doesn't convert the HTML character entity #039; to an apostrophe.

Expected behavior HTML character entities are replace with the correct character.

Screenshots Screen Shot 2020-12-31 at 12 23 27 PM

Browser FF 84

christianlupus commented 3 years ago

Hello. Could you please verify if the JSON file does contain valid text? In your recipe folder in the NC data storage there is a folder named after the recipe name (mijoto-mocktail-recipe). Look into that for a recipe.json. Either paste this file here or check if you can open the file in a text editor. Look for the same text. I assume, the text ' is within the JSON file.

paulcalabro commented 3 years ago

Sure! Here's the corresponding JSON file. It also contains the character entity:

{
  "@context": "http://schema.org",
  "@type": "Recipe",
  "name": "Mojito Mocktail Recipe",
  "author": {
    "@type": "Person",
    "name": "Meggan Hill"
  },
  "description": "This refreshing minty Mojito Mocktail has all the flavor and none of the booze! Just muddle fresh mint, lime, and sugar (or agave) and top with ice and club soda. You'll never even miss the rum.",
  "datePublished": "2020-04-20T01:00:56+00:00",
  "image": "https://www.culinaryhill.com/wp-content/uploads/2020/04/Mojito-Mocktail-Recipe-Culinary-Hill-LR-square-480x270.jpg",
  "recipeYield": 1,
  "cookTime": "PT0H5M",
  "totalTime": "PT0H5M",
  "recipeIngredient": [
    "10 fresh mint leaves (plus more for garnish)",
    "1/2 lime (cut into 4 wedges, divided)",
    "2 tablespoons granulated sugar (or to taste)",
    "1 cup ice cubes",
    "1/2 cup club soda"
  ],
  "recipeInstructions": [
    "In a medium sturdy glass, add mint leaves and 1 lime wedge. Use a muddler to crush the mint and lime, releasing the mint oils and lime juice.",
    "Add 2 more lime wedges and the sugar, and muddle again to release the lime juice. Do not strain the mixture.",
    "Fill the glass almost to the top with ice. Add club soda and more sugar to taste if desired. Garnish with mint leaves and remaining lime wedge."
  ],
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "5",
    "ratingCount": "2"
  },
  "recipeCategory": "Drinks",
  "recipeCuisine": [
    "Cuban"
  ],
  "keywords": "lime,mint",
  "nutrition": {
    "@type": "NutritionInformation",
    "calories": "128 kcal",
    "servingSize": "1 serving"
  },
  "@id": "https://www.culinaryhill.com/mojito-mocktail-recipe/#recipe",
  "isPartOf": {
    "@id": "https://www.culinaryhill.com/mojito-mocktail-recipe/#article"
  },
  "mainEntityOfPage": "https://www.culinaryhill.com/mojito-mocktail-recipe/#webpage",
  "tool": [],
  "url": "https://www.culinaryhill.com/mojito-mocktail-recipe/"
}
christianlupus commented 3 years ago

I checked things out. The issue is due to the fact that the webpage is not publishing a well written JSON format with the recipe. The problem was introduced during the parsing of the HTML page. In the original HTML page it is You'll, where 39=0x27 is the unicode number of the single apostrophe. When printing in a browser this is correctly rendered as an apostrophe but the JSON is not containing the apostrophe itself but a pure representation of it. As a result, when parsed by the cookbook app, this is not decoded as it is not assumed to happen.

We could add a filter to the import routing but the site owner should be better made aware of the fact that the JSON generated is valid but does not represent the values as a human would read it.

paulcalabro commented 3 years ago

Yeah, I agree. Especially with the "it is not assumed to happen" part. They should really fix that on their end. That being said, it might be useful to better handle such scenarios where the site owner isn't/doesn't want to clean up the JSON. If you point me to the relevant parts of the code, I can take a stab at it. 😊

christianlupus commented 3 years ago

That part of the code is a mess at the moment. I'd rather avoid further hacks here in favor of a more elegant and permanent solution. Unfortunately, I do not have done these preparations yet.