nextcloud / cookbook

🍲 A library for all your recipes
https://apps.nextcloud.com/apps/cookbook
GNU Affero General Public License v3.0
507 stars 87 forks source link

Import recipes with a dedicated JSON graph #1675

Open christianlupus opened 1 year ago

christianlupus commented 1 year ago

In general, it is possible to use a @graph entry to define a JSON Schema object with references and more complex structures.

Our current importer is not able to cope with this situation. There is simply speaking no recipe found. This should be solved by creating a more stable and more robust import parser.

There are a few issues on the tracker that all boil down to this root cause. This one should be considered the central issue to tackle all these pages.

christianlupus commented 1 year ago

All these issues can be used as sources for test cases in the development process

vengefulpunk commented 9 months ago

I ran into this issue today, and it seems to be any WordPress installation running Yoast(possibly other search optimization plugins, too, but all the recipes I checked, as well as the links above) will not be able to be imported as they are using the newer more advanced Schema standard.

This does not help with a solution as it is already established the importer needs to be modified but I wanted to add what I found out.

teledyn commented 7 months ago

Yoast will be a popular offender, and in all cases I have investigated, those with @graph are script type="application/ld+json" class="yoast-schema-graph" - sadly it is very widely used. So far, I have only found one recipe in my recipe bookmarks where the current parser works.

for example, this Wafu Dressing validates perfectly and would be wonderful to have if we could parse it.

I don't know much about Nextcloud apps yet, but I do know some rdf, json-ld and schema.org, and I'm guessing the parser is JsonService?

seyfeb commented 7 months ago

@teledyn The parsing of the websites contain the recipe information is done in the backend. I think the parsing logic should be located in lib/Helper/HTMLParser/HttpJsonLdParser.php which extends an abstract parser class. The idea was/is to support more parsers in the future, but they would need to be written first ;)

teledyn commented 2 months ago

It does look like the code attempts to correct for @graph

        // Look through @graph field for recipe
        $this->mapGraphField($json);

        // Look for an array of recipes
        $this->mapArray($json);
a575606 commented 2 months ago

Just to add my voice to the chorus, as much as I like using the app, probably only about 1 in 10 recipes that I try importing do so successfully. The rest fail with the parser error. Manually adding recipes, on the other hand, is laborious enough that I rarely take the time to add recipes anymore. It involves lots of switching back and forth to copy and paste line by line, downloading images and uploading to nextcloud, remembering urls, etc. Perhaps if a more robust parser is too large an undertaking, how about improving the add recipe user interface?

christianlupus commented 2 months ago

Perhaps if a more robust parser is too large an undertaking, how about improving the add recipe user interface?

What exactly do you have in mind? There is a major UI rework currently on its way. If there is a good suggestion, this might (I cannot guarantee implementation, though) be be added. Maybe you could open a discussion (or new issue) to discuss this (to avoid cluttering this issue here)?


For the original problem: We are hearing your issues. The problem is that the schema.org standard allows for a zillion different variants on how meta data (like the recipes in the pages imported) can be represented. Also not all pages are handling this conform to the standard. We have to keep that together and write a generic parser.

However, we want to write it such that it can be extended and augmented as need arises. The current implementation is not really built with these constraints in mind. Thus, a complete restructuring of the parser needs to be carried out.

There is already a prototype on its way to test out if the architecture assumptions hold true and lead to a good architecture. We then need to implement this in the cookbook itself. I guess we will push out a version 0.11.1 before that as there are some urgent things to handle with the release of NC29.