scrapy / itemadapter

Common interface for data container classes
BSD 3-Clause "New" or "Revised" License
61 stars 13 forks source link

Improve performance #62

Closed elacuesta closed 2 years ago

elacuesta commented 2 years ago

Try to keep improving performance after #60.

It's my understanding that any is short-circuit, however there seems to be some penalty. Using a modified version of this benchmarking script:

from timeit import timeit
from itemadapter import is_item
from scrapy.item import Item

def test_dict():
    return is_item({})

def test_item():
    return is_item(Item())

print("test_dict:", timeit(test_dict, number=10**6))
print("test_item:", timeit(test_item, number=10**6))

1203b5e54ad75c50279f8bbee05837a6f0c1d63c (master branch)

test_dict: 2.421997083
test_item: 3.901377961

873301421945cc80ee6d5fd95969203b5b2cae2e (this branch)

test_dict: 1.6679764910000001
test_item: 3.0513542609999997

The above numbers are from a 2020 Macbook Pro, Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz

codecov[bot] commented 2 years ago

Codecov Report

Merging #62 (8733014) into master (1203b5e) will not change coverage. The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master       #62   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            4         4           
  Lines          285       291    +6     
=========================================
+ Hits           285       291    +6     
Impacted Files Coverage Δ
itemadapter/adapter.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 1203b5e...8733014. Read the comment docs.

kmike commented 2 years ago

@elacuesta what motivates this change? Was it a real-world performance issue which is solved in this PR? I'm not sure I understand the implications of this PR - would it make itemadapter faster in a real world? The benchmark you posted is for Item vs dict creation, not for itemadapter before & after the change, probably I'm missing something :)

elacuesta commented 2 years ago

@Gallaecio reported a performance decrease in #59 (based on some research he did with itemloaders), I tried to improve that with #60, but there are still a few things to try in order to keep improving. The benchmark tests both dict and Item creation, I executed it before and after the changes of this PR, showing a ~30% time decrease for dict and a ~20% for Item.

elacuesta commented 2 years ago

Ping @kmike 😄