hhursev / recipe-scrapers

Python package for scraping recipes data
MIT License
1.68k stars 519 forks source link

Canonical url intergration batch 3 #905

Closed jknndy closed 10 months ago

jknndy commented 11 months ago

Batch 3, scrapers M - R, few reworks but nothing major in this one!

Non-scraper changes

jknndy commented 10 months ago

@jayaddison, I think this branch is ready for merge as the open reviews are going to be handled in a separate PR. I'm not sure how to resolve the conflicting files though, it looks like i running the recommended commands will merge this directly into main. Are you able to resolve them as part of the merge?

jayaddison commented 10 months ago

Thanks for the heads-up @jknndy - I'd prefer if you could resolve the conflicts; they're usually best resolved by the developer of the branch, and although it's sometimes an annoying and complicated process, it's beneficial to become familiar with. It's preferable to work with people who cause the fewest merge conflicts in the first place - so my apologies about this one - but it can and does happen in many projects.

jayaddison commented 10 months ago

@jknndy ah, I think I see what you mean about the GitHub-suggested recommended commands.

I think what to do instead is to ensure that you have an up-to-date copy of the main branch (git checkout main; git pull origin main) and then merge that into this branch (git checkout canonical_url_intergration_batch_3; git merge main).

Then you'll find the conflicts locally. A shortcut for two of them -- the HTML files, where we don't want to merge two files but instead simply take the one from this branch -- is to reset the file to the version stored in a specific branch: git reset canonical_url_intergration_batch_3 -- tests/test_data/maangchi.testhtml). The conflicts in tests/test_matprat_1.py will require manual resolution, though.

jknndy commented 10 months ago

Thanks very much! got it & i think we're in good shape for now. Once #915 is merged ill run through this again and then we should be good to go.

edit: not sure whats causing this failure, the files listed in the linter error weren't changed in this PR. & the coverage failures are related to #909


generate.py:211:96: E231 missing whitespace after ','
generate.py:222:85: E231 missing whitespace after ','
recipe_scrapers/goustojson.py:20:20: E231 missing whitespace after ':'
recipe_scrapers/grandfrais.py:57:40: E231 missing whitespace after ':'
recipe_scrapers/kptncook.py:32:26: E231 missing whitespace after ':'
recipe_scrapers/kuchniadomowa.py:15:23: E231 missing whitespace after ':'
recipe_scrapers/mindmegette.py:26:41: E231 missing whitespace after ':'
recipe_scrapers/nihhealthyeating.py:62:37: E231 missing whitespace after ':'
recipe_scrapers/weightwatchers.py:118:55: E702 multiple statements on one line (semicolon)
recipe_scrapers/woolworths.py:14:27: E231 missing whitespace after ':'
jayaddison commented 10 months ago

That is strange indeed about the lint failures. The list of files there looks similar to the differences between the main and v15 branches - but nothing about this pull request should involve the v15 branch. Odd. I'll try to take more of a look soon.

jayaddison commented 10 months ago

Ah: whatever the cause of the lint failures, they're happening in main too -- so it's nothing to do with your changes.

Interestingly the coverage workflow is failing too. It specifies python version 3.x -- and the failure looks like #909.. so I guess the GitHub setup-python action is defaulting to Python 3.12 there (nice, in a way - except that we don't support that here yet).

jayaddison commented 10 months ago

That is strange indeed about the lint failures. The list of files there looks similar to the differences between the main and v15 branches - but nothing about this pull request should involve the v15 branch. Odd. I'll try to take more of a look soon.

Nope, any similarity with v15 there is coincidence, by the looks of it.

I think that the common thread between the additional lines being reported is that they are Python f-strings. Perhaps the linters are now being fed with parsed f-string contents since they are now included as part of the Python 3.12 grammar.

I do think that that could be the case, based on this very unscientific testing with a sample size of one:

$ cat foo 
mailto:testing@example.org
$ black foo 
reformatted foo

All done! ✨ 🍰 ✨
1 file reformatted.
$ cat foo 
mailto: testing @ example.org
jayaddison commented 10 months ago

Nope, ignore me - that example wasn't particularly relevant. But I do think that the f-string grammar theory could somehow be related, despite that.

jayaddison commented 10 months ago

Trying to disprove my own naive theory there:

$ python --version
Python 3.12.0
$ cat example.py
string = f"mailto:better-testing@example.org"
$ black example.py
All done! ✨ 🍰 ✨
1 file left unchanged.
jayaddison commented 10 months ago

@jknndy it seems that the coverage and lint workflows don't work yet with Python 3.12 - and that became the default for the setup-python action recently. I've pinned those workflows to use lower versions of Python short-term (#917, #918) until we can upgrade.

If you merge the latest changes from main into this branch, tests should pass again :crossed_fingers: