jadkins89 / Recipe-Scraper

A JS package for scraping recipes from the web.
https://www.npmjs.com/package/recipe-scraper
MIT License
97 stars 52 forks source link

support jamie oliver scraping (and fix the entire project) #44

Closed shaniqwa closed 3 years ago

shaniqwa commented 3 years ago

Hi there, very cool project. I wanted to explore scraping and adding a new website, did Jamie Oliver as suggested here: ​https://github.com/jadkins89/Recipe-Scraper/issues/32

but when I opened a PR I have found out that many of the tests fail... for a good reason - there have been quite a few changes out there. Hopefully, things will work better now..!

In this PR:

For some reason, I'm still getting 1 fail test on Travis. it failed to fetchDOMModel: response status 503

 177 passing (3m)
  1 failing
  1) closetCooking
       should fetch the expected recipe:
     Error: No recipe found on page
     at ClosetCookingScraper.defaultError (helpers/BaseScraper.js:7:164)
      at ClosetCookingScraper.fetchDOMModel (helpers/PuppeteerScraper.js:9:1619)
      at processTicksAndRejections (node:internal/process/task_queues:96:5)
      at async ClosetCookingScraper.fetchRecipe (helpers/BaseScraper.js:19:169)
      at async Context.<anonymous> (test/helpers/commonRecipeTest.js:15:26)

but on my local machine all test pass: 178 passing (4m)

would appreciate help getting this last one to pass as well :) cheers

UPDATE: I have added a validation, skipping the test in case the server is not responding. let me know what you think

jadkins89 commented 3 years ago

Hey there! Thanks for all this work in getting the package up to date, adding a new prop, and scrape. I'll take a look at it ASAP, hopefully by this weekend. I see you're still debugging, so let me know if you make any headway.

shaniqwa commented 3 years ago

@jadkins89 Hi 🙋‍♀️ I still don't know why the request to closetCooking is getting 503 when coming from the Travis server, but I have added some workaround - to pass tests if the server is not responding. (should be skipped properly and not passed... I can continue working on it if you think this is a good direction).

jadkins89 commented 3 years ago

@shaniqwa That solution will work for now. Certain websites seem to have occasional issues with being scraped from TravisCI. We may get some false positives but I don't know if there is a perfect solution here.

One thing I'd like removed before merging is the dependency to moment. It is only being used in one scrape, so the added bloat doesn't seem worth it.

Thank you so much your time and effort bringing the scrapes back up to par!

shaniqwa commented 3 years ago

@jadkins89 done :)