scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
51.16k stars 10.35k forks source link

fix: LxmlLinkExtractor unique_list missing key #6221

Closed jxlil closed 4 weeks ago

jxlil commented 3 months ago

Added key argument to unique_list in LxmlLinkExtractor

Before:

seenkey: Link(url='http://example.com/page1', text='Página 1', fragment='', nofollow=False)

After:

seenkey: http://example.com/page1

Closes https://github.com/scrapy/scrapy/issues/3273

codecov[bot] commented 3 months ago

Codecov Report

Merging #6221 (5e51417) into master (1c9d308) will increase coverage by 0.04%. Report is 68 commits behind head on master. The diff coverage is 100.00%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #6221 +/- ## ========================================== + Coverage 88.55% 88.60% +0.04% ========================================== Files 160 161 +1 Lines 11607 11775 +168 Branches 1883 1908 +25 ========================================== + Hits 10279 10433 +154 - Misses 1003 1011 +8 - Partials 325 331 +6 ``` | [Files](https://app.codecov.io/gh/scrapy/scrapy/pull/6221?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapy) | Coverage Δ | | |---|---|---| | [scrapy/linkextractors/lxmlhtml.py](https://app.codecov.io/gh/scrapy/scrapy/pull/6221?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapy#diff-c2NyYXB5L2xpbmtleHRyYWN0b3JzL2x4bWxodG1sLnB5) | `95.55% <100.00%> (ø)` | | ... and [9 files with indirect coverage changes](https://app.codecov.io/gh/scrapy/scrapy/pull/6221/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapy)
Gallaecio commented 3 months ago

Can you add a test for it?

jxlil commented 3 months ago

I think that in tests/test_linkextractors.py all cases are already being covered. On codecov.io it also seems that this change is covered: https://app.codecov.io/gh/scrapy/scrapy/pull/6221

Gallaecio commented 3 months ago

If this was properly covered, the tests would have failed before the change. Let me give it a go later on.

jxlil commented 2 months ago

Hi @Gallaecio I see that you added some tests (some were also added here: https://github.com/scrapy/scrapy/pull/6232)

Is there something missing that I can help with?

jxlil commented 2 months ago

Oh ok, no problem. Thanks!

jxlil commented 1 month ago

Hi @wRAR ,

Hope you're doing well. Are any additional changes needed in this PR? Thank you.

wRAR commented 1 month ago

@jxlil this is still in my backlog, I hope to revisit it soon :)