scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
51.16k stars 10.35k forks source link

Bandit: allow-list lxml usages #6265

Closed Gallaecio closed 2 months ago

Gallaecio commented 2 months ago

In most cases, the actual loading of the document is done by parsel, not Scrapy.

form.py was 1 exception, but I have refactored it to use the response selector instead. The application of get_base_url in unified.py was a bug fix detected in the process.

Another exception is iterparse_lxml, which now uses resolve_entities=False as parsel does.

And then there was the sitemap code, which was already disabling entity resolution.

codecov[bot] commented 2 months ago

Codecov Report

Merging #6265 (6cbf33d) into master (aa1bf69) will increase coverage by 0.00%. The diff coverage is 92.85%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #6265 +/- ## ======================================= Coverage 88.90% 88.91% ======================================= Files 161 161 Lines 11793 11796 +3 Branches 1914 1914 ======================================= + Hits 10485 10488 +3 Misses 964 964 Partials 344 344 ``` | [Files](https://app.codecov.io/gh/scrapy/scrapy/pull/6265?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapy) | Coverage Δ | | |---|---|---| | [scrapy/http/request/form.py](https://app.codecov.io/gh/scrapy/scrapy/pull/6265?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapy#diff-c2NyYXB5L2h0dHAvcmVxdWVzdC9mb3JtLnB5) | `97.76% <100.00%> (+0.03%)` | :arrow_up: | | [scrapy/linkextractors/lxmlhtml.py](https://app.codecov.io/gh/scrapy/scrapy/pull/6265?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapy#diff-c2NyYXB5L2xpbmtleHRyYWN0b3JzL2x4bWxodG1sLnB5) | `97.03% <100.00%> (ø)` | | | [scrapy/selector/unified.py](https://app.codecov.io/gh/scrapy/scrapy/pull/6265?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapy#diff-c2NyYXB5L3NlbGVjdG9yL3VuaWZpZWQucHk=) | `100.00% <100.00%> (ø)` | | | [scrapy/utils/\_compression.py](https://app.codecov.io/gh/scrapy/scrapy/pull/6265?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapy#diff-c2NyYXB5L3V0aWxzL19jb21wcmVzc2lvbi5weQ==) | `91.89% <ø> (ø)` | | | [scrapy/utils/sitemap.py](https://app.codecov.io/gh/scrapy/scrapy/pull/6265?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapy#diff-c2NyYXB5L3V0aWxzL3NpdGVtYXAucHk=) | `96.15% <100.00%> (ø)` | | | [scrapy/utils/versions.py](https://app.codecov.io/gh/scrapy/scrapy/pull/6265?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapy#diff-c2NyYXB5L3V0aWxzL3ZlcnNpb25zLnB5) | `100.00% <100.00%> (ø)` | | | [scrapy/utils/iterators.py](https://app.codecov.io/gh/scrapy/scrapy/pull/6265?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=scrapy#diff-c2NyYXB5L3V0aWxzL2l0ZXJhdG9ycy5weQ==) | `91.97% <50.00%> (ø)` | |