seethroughdev / obsidian-recipe-grabber

MIT License
55 stars 8 forks source link

Scrape Microdata, Fallback to HTML if Necessary #23

Closed laneparton closed 9 months ago

laneparton commented 9 months ago

First, this is a really great plugin! I had the desire a week or so ago to hook an LLM up to my Obsidian notes but I realized one of the main things I wanted to ask it was to build a weekly meal plan off of my everyday recipes - recipes I didn't have saved :)

Anyway, long story short, I stumbled onto this plugin and after recently reading this blog post: https://www.benawad.com/scraping-recipe-websites/ - I was inspired to take a shot at scraping more information from recipe websites.

A couple notes:

  1. I'm open to any and all feedback - even down to the smallest detail of naming things.
  2. I'm still testing this in my own everyday workflow - I am sure there are changes to be made (especially around scoring). I just wanted to start a dialogue on how we could merge this eventually.
  3. I'm open to ditching the Share Menu extension - I thought I might find it handy when I find recipes on my phone. I haven't tested it thoroughly yet.

And lastly an overview:

  1. (Existing) If the website has the JSON schema we're looking for, extract it. If that schema has missing ingredients/instructions - we'll find them. I found this case on a Sam The Cooking Guy recipe I was testing: https://www.thecookingguy.com/recipes/cherry-bourbon-glazed-ham.
  2. If the microdata is on HTML tags - we'll scrape them and build a JSON schema from it. I found this recipe/HTML just googling around: https://acadianatable.com/2024/01/15/boudin-king-cake/
  3. If there is no JSON Schema or Microdata at all - let's parse/scrape the HTML for a recipe
seethroughdev commented 9 months ago

First off, wow. Thank you @laneparton for putting the time into trying to make this plugin better.

I really like your idea of using the LLM to make meal plans. That sounds like an extremely useful plugin by itself, or even a product. I'd love to see it when you have something to share.

As for the PR. First, I'd love to get @Flxp49 opinion on all this as well. But couple of thoughts for now though.

Code-wise. Excellent work. Super clear. Wow, tests. Easy to follow. 10/10.

As far as features, this is my concern, and again, I'm open to any kind of discussion here.

  1. The share menu feels like a separate PR at least, I like it, but I think like you, I'm not sure if its something we really need. But if people do, I'm open to adding it.

  2. Now for the big one, supporting non-json schemas in general sounds like a TON of overhead, and future support. Most of which I'm not capable of offering. So far, I've had one issue filed of a non-json schema site. I know there are more out there (I like Sam the Cooking guy too), but is there enough to warrant more code than the original plugin?

I can tell how much thought and work you had to put into the initial parsing, there's others doing similar things with NLP and ingredient/instruction parsing. But this will be ever-changing, and right now we actually get to support multi-language out of the box since it uses schemas.

I'm not saying no to this PR. I'm just saying, I'm not sure if it's the right fit for the plugin. I'd love to hear your thoughts on this, as well as @Flxp49 .

Another idea would be, fork this plugin, or create your own with the non-json schema support, and add the LLM? If you put it out there and its better, I'd be more than happy to take mine down in favor of it. With the vision api and chatgpt, you could probably skip the dom parsing altogether!

Please let me know what you all think, and thanks again.

laneparton commented 9 months ago

Those are very valid points! I totally agree that maintaining the expectation of HTML scraping could be cumbersome.

I had started baking keywords (for localization) into settings but I turned my attention to the issues I was hitting with scoring/extraction 😄

I'm more than happy to close this, keep using/refining the fork for my own use, and see where that goes!

seethroughdev commented 9 months ago

That sounds perfect. Please keep us updated your progress, and let us know how the LLM integration works out!

Flxp49 commented 8 months ago

Hi there, Sorry for the late reply.

I definitely think the LLM integration deserves its own plugin and I have seen people on the obsidian subreddit look for something like it. Regarding the recipe parsing, its very unpredictable and would require a lot of support to always keep up as mentioned by @seethroughdev. There's actually an existing project that supports some of the non-schema websites.

Anyway, I love your take with the LLM for meal planning. It's something a lot of people would use @laneparton.