Open josefhelie opened 6 months ago
Hi @josefhelie - thanks for the question / feature request.
In theory, yes this is possible - the webpage is public and represents a recipe. However, there are some potentially important items of information absent on the page: in particular, its origin (from another website? self-authored?) and the instructions.
Do you know whether those details can be included when sharing a recipe like this from the app? It's difficult to develop and test without a few complete samples.
i'm sorry I shared a recipe that don't reflect all the requested fields. Here is a better example: https://dashboard.bergamot.app/shared/mIB4jYQtZU1A97 Is it better?
Yep, that initially looks good to me @josefhelie - it's difficult to say for certain without coding it up, but it seems to have most/all of the information we'd need. Thanks!
Thanks a lot @jayaddison :)
May I ask any update on this request @jayaddison? thanks :)
Hi @josefhelie - apologies for my delayed reply. No further updates on this at the moment I'm afraid. Do you have any interest in learning some Python coding?
@jayaddison I took a look, looks like it is fairly easy to call the API endpoint, which can be derived from the URL of the recipe. For https://dashboard.bergamot.app/shared/mIB4jYQtZU1A97 the associated API endpoint is https://api.bergamot.app/recipes/shared?r=mIB4jYQtZU1A97.
I'm not sure how the library normally supports the case of recipes being loaded via an API call after the original page load - I can see a few examples (goustojson.py
, monsieurcuisine.py
) that seem to do this - I would be happy to tackle this if you are happy to me to do so?
Thanks @mlduff!
I'm not sure how the library normally supports the case of recipes being loaded via an API call after the original page load - I can see a few examples (
goustojson.py
,monsieurcuisine.py
) that seem to do this - I would be happy to tackle this if you are happy to me to do so?
About the handling of APIs: yep, well discovered - we do have a few scrapers that retrieve data using APIs at the moment. A potential design/architecture problem with that is that it (currently) tightly-couples the scraper to an HTTP client - namely requests
at the moment; nearly a de-facto client for Python, but even so, it may not be ideal to depend entirely on it.
Meanwhile we have a v15
development branch that can optionally use requests
, but that otherwise requires callers to retrieve the HTML and pass it to the scraper themselves. Marginally less convenient, but allowing callers to use whatever HTTP client(s) they prefer (anything from built-in urlopen
, low-level urllib3
, requests
, httpx
, etc).
A long explanation, but the short answer is: yep, please go ahead, but be aware that this would currently only be supported in the v14 / mainline branch.
@mlduff also a design / implementation question for your consideration: those recipes sometimes contain a link to the original source of the recipe. Should we return that as the canonical URL for recipes when possible?
Meanwhile we have a
v15
development branch that can optionally userequests
, but that otherwise requires callers to retrieve the HTML and pass it to the scraper themselves. Marginally less convenient, but allowing callers to use whatever HTTP client(s) they prefer (anything from built-inurlopen
, low-levelurllib3
,requests
,httpx
, etc).
@jayaddison is your preference for me to develop this in the v15 branch? If I implement in v14 (which seems easier), will it then need rewriting at some point (are the other ones like the example I found going to also need similar rewriting?)?
@mlduff also a design / implementation question for your consideration: those recipes sometimes contain a link to the original source of the recipe. Should we return that as the canonical URL for recipes when possible?
Good point, will try to do that.
Meanwhile we have a
v15
development branch that can optionally userequests
, but that otherwise requires callers to retrieve the HTML and pass it to the scraper themselves. Marginally less convenient, but allowing callers to use whatever HTTP client(s) they prefer (anything from built-inurlopen
, low-levelurllib3
,requests
,httpx
, etc).@jayaddison is your preference for me to develop this in the v15 branch? If I implement in v14 (which seems easier), will it then need rewriting at some point (are the other ones like the example I found going to also need similar rewriting?)?
I'd recommend implementing it for v14
, yep.
Hi @josefhelie - apologies for my delayed reply. No further updates on this at the moment I'm afraid. Do you have any interest in learning some Python coding? Thanks @jayaddison, but i don't have enough free time to do that, even if I would like to!! 😢 Thanks @mlduff too :)
@jayaddison I noticed that the tests for the two scrapers I mentioned above are located under the legacy section - do I add my tests under there as well?
@josefhelie are you able to provide a couple more recipe URLs please so I can test?
@jayaddison I noticed that the tests for the two scrapers I mentioned above are located under the legacy section - do I add my tests under there as well?
@mlduff yep, that's the correct place for those; thanks for checking :+1: You should be able to configure the expected_requests
property in the tests to return example results for both the initial HTML HTTP GET
response, and also the subsequent (probably also HTTP GET
) API request.
@josefhelie have you found any pages shared on Bergamot where the original author is credited? I've seen a few pages that have the domain name of the source URL.. I'm wondering whether there are any that list names/usernames.
@jayaddison I'm not sure I have. Would it help you if you provide me a recipe I could import into Bergamot and then give you the link towards the imported recipe?
@josefhelie Here is one that has an author https://www.bestrecipes.com.au/recipes/peanut-butter-cookies-recipe/fowk6kuy
I imported it in my Bergamot, here it is: https://dashboard.bergamot.app/shared/REbGkQaNoVJ5kM
Thanks @josefhelie - so roughly speaking, it seems like some source recipes may include author info, and the Bergamot page includes a link back to the original, but our scraper can't directly retrieve the author details at the moment (they're not in the Bergamot page, so it seems like we'd have to ask Bergamot to add those, or to retrieve them ourselves from the original URL).
I'm not completely sure what to do here; I personally place quite a lot of important on retaining the author name/info (even though it's challenging sometimes) because my assumption is that a lot of recipe authors themselves would want that to be included when people view their recipes.
I haven't contacted Bergamot to ask whether they'd consider attempting to include that info themselves, so that's one option I'm considering. Is there a support/feedback option in the app itself?
I'm currently using the free app Bergamot (which is closed source) to store my recipes, but I'd like to move to Mealie. I've encountered an error message that says, 'recipe_scrapers was unable to scrape this URL.' Is it possible to get a scraper, please? 😇 Thanks for your help. A link to a shared recipe: https://dashboard.bergamot.app/shared/T8IJLjbtHdh2pj