zytedata / zyte-common-items

Contains the common item definitions used in Zyte.
BSD 3-Clause "New" or "Revised" License
9 stars 6 forks source link

Add new pipeline DropLowProbabilityItemPipeline #91

Closed PyExplorer closed 6 months ago

PyExplorer commented 6 months ago

This pipeline was implemented during the work on zyte-spider-templates and it was decided to move it to this repository as a general solution for dropping items with low probability.

kmike commented 6 months ago

@PyExplorer make sure to setup pre-commit, as described in https://zyte-common-items.readthedocs.io/en/latest/contributing.html

Gallaecio commented 6 months ago

@kmike How do you feel about depending on scrapy? I know we have tried to avoid that in the past. Should we keep it as an optional dependency? (and copy the load_object code)

codecov-commenter commented 6 months ago

Codecov Report

Merging #91 (e3e90b7) into main (8c37eed) will not change coverage. Report is 31 commits behind head on main. The diff coverage is 0.00%.

:exclamation: Current head e3e90b7 differs from pull request most recent head cee7c86. Consider uploading reports for the commit cee7c86 to get more accurate results

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #91 +/- ## ====================================== Coverage 0.00% 0.00% ====================================== Files 11 54 +43 Lines 1795 2107 +312 ====================================== - Misses 1795 2107 +312 ``` | [Files](https://app.codecov.io/gh/zytedata/zyte-common-items/pull/91?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) | Coverage Δ | | |---|---|---| | [zyte\_common\_items/pipelines.py](https://app.codecov.io/gh/zytedata/zyte-common-items/pull/91?src=pr&el=tree&filepath=zyte_common_items%2Fpipelines.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-enl0ZV9jb21tb25faXRlbXMvcGlwZWxpbmVzLnB5) | `0.00% <0.00%> (ø)` | |
PyExplorer commented 6 months ago

Thanks a mil @kmike for the final detailed reviews and valuable suggestions. Agree, that using mock approach in such projects is probably not the best choice, and I will definitely include rewriting tests for this pipeline here in my plans.