Closed synergiator closed 4 years ago
The recipes given do not have ingredients listed on the site, indeed. The scrapers are functioning as intended so I'll close the issue. Feel free to reopen if I had missed your point 🙂
This package is intended to be a super simple tool handling the operation of parsing the html. If no data in the html is found - the scrapers won't assume anything. They will return defaulting values and that's it.
Depending on your use case and aim you can:
However, that decision is beyond the package responsibilities.
One can ask for advice on how to normalize recipes data, speed up scraping, elude bot protection mechanisms and whatever else comes across when building scraping data related project, but these things are not the recipe-scrapers
job.
As it seems, at least as of 2017, one of the scrapers (epicurious) did not throw away URLs for some reasons. This could be either an acceptable weakness by a design decision, or a missing feature in the design.
Actual problem: some of parsed Epicurious recipes do not contain the element "ingredients". It is just not there.
untitled
either/recipes/food/views/reserve-this-recipe-id-for-future-use-51234840
).I do understand this is rather a problem with data outliers than with the scrapers, so maybe there is a need to clarify how much scraping intelligence and data model sensitivity is required at this level, and if none, how to best implement/integrate it as it seems be a generally relevant use case in this context. (i.e. one outlier is marginal problem, but across many large datasets this can sum up to a bigger issue in terms of data quality).
The issue needs to be of course validated with an up to date version.
URL of recipes producing recipe data without ingredients: