Closed robmc-itpro closed 1 year ago
Ah.. ok, I see what's going on here.
Basically: the instruction
data from Gousto that we're calling normalize_string
on may contain HTML, such as paragraph elements (<p>
) in this case.
A quickish fix would probably be to look for paragraph elements specifically within the instructions and add newlines between paragraphs.
Pre-filing checks
The URL of the recipe(s) that are not being scraped correctly
...
The results you expect to see
The scraper doesn't seem to be recognizing the line breaks in the instructions. In Tandoor and Mealie this means there's some manual work required to fix the line breaks on each recipe. I'm not sure how fixable this is as I have no python/scraping experience.
Step 1 for example looks like the following on the original web page.
The results (including any Python error messages) that you are seeing