Closed smilerz closed 2 months ago
I have three questions/requests for feedback on the latest commit.
Is there a way to use HTMLTagStripperPlugin here?
headnote = f"Note: {normalize_string(re.sub(r'<.*?>', '', headnote))}\n"
Is there a way to avoid duplicating the ingredient formatting code between ingredients() and ingredient_groups()?
And finally - in ingredients, sometimes they preface the post_text with ", ...." and sometimes they don't. Is there an elegant way to preface the post text with a space if there isn't some sort of delimiter?
One more question: ATK also has a format for instruction_groups()
, but they aren't correlated to the ingredient_groups()
. As an example their recipe for lasagna has 3 groups of ingredients named 'meat sauce', 'bechamel, 'noodles and cheese'.
There are 5 instructions groups with no names assigned, but can be eyeballed to include the above 3 assembly steps and 2 intermediate steps on the actual baking.
I'm not sure if it's worth updating the instruction_groups()
or not.
Is there a way to use HTMLTagStripperPlugin here?
I'm not too familiar with that feature to be honest - but I'll investigate that soon and then will let you know (unless someone else gets to that before me).
Is there a way to avoid duplicating the ingredient formatting code between ingredients() and ingredient_groups()?
Yes, for this there is a helper utility - I'd recommend referring to the _grouping_utils.py
documentation for that.
And finally - in ingredients, sometimes they preface the post_text with ", ...." and sometimes they don't. Is there an elegant way to preface the post text with a space if there isn't some sort of delimiter?
Hrm, that's annoying. Maybe we could trim the left of the postText
until we find an alphanumeric character; there's no common functionality in the library to do that, though, as far as I'm aware.
I looked at the _grouping_utils.py
it is using soup and HTML elements, but the grouping for ATK is already structured json data so is parsed differently. Or maybe I don't fully understand how so use the util?
One more question: ATK also has a format for
instruction_groups()
, but they aren't correlated to theingredient_groups()
. As an example their recipe for lasagna has 3 groups of ingredients named 'meat sauce', 'bechamel, 'noodles and cheese'.There are 5 instructions groups with no names assigned, but can be eyeballed to include the above 3 assembly steps and 2 intermediate steps on the actual baking.
I'm not sure if it's worth updating the
instruction_groups()
or not.
Interesting!
We don't support groups of instructions at the moment - only groups of ingredients. However, both do make sense in recipes, and it's good to have found an example of that -- and to note the possibility for them to be linked in some cases.
I'm thinking about what to do here, too.
Oh, thanks for the clarification - I misread what instructions_list()
was all about (and misrememberd the name)
That's OK, they are similar names! :)
I looked at the
_grouping_utils.py
it is using soup and HTML elements, but the grouping for ATK is already structured json data so is parsed differently. Or maybe I don't fully understand how so use the util?
Of course; I forgot that we're using JSON input here. No, I think you've got it - the grouping utils are intended for use with HTML, and they use CSS-based selector queries.
I'll try experimenting with the scraper code here to explore other possibilities.
Thanks - I appreciate the collaboration.
You're welcome - thanks for the pull request. I'll provide some more review comments within the next day or two.
I saw the tests failed - my mistake, I forgot to update expected values after stripping the leading zeros from commas.
@smilerz this has been included and released in v14.56.0
(and also the pre-release v15.0.0-rc3
) of recipe-scrapers
on PyPi.
@smilerz this has been included and released in
v14.56.0
(and also the pre-releasev15.0.0-rc3
) ofrecipe-scrapers
on PyPi.
It doesn't look like it's in 14.56.0
- which is what I expected since it's behind a paywall.
Ah, oops - yep, my mistake (that notification was copy-pasted; I forgot about the different circumstance here :/ ). Thanks @smilerz.
ATK is behind a paywall - so this only makes sense to add as part of v15