scrapy / itemadapter

Common interface for data container classes
BSD 3-Clause "New" or "Revised" License
61 stars 13 forks source link

Avoid imports inside functions #60

Closed elacuesta closed 2 years ago

elacuesta commented 2 years ago

Related to #59 (not sure if it fixes it completely)

Tasks:

codecov[bot] commented 2 years ago

Codecov Report

Merging #60 (fdf1339) into master (817054d) will increase coverage by 0.37%. The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           master       #60      +/-   ##
===========================================
+ Coverage   99.62%   100.00%   +0.37%     
===========================================
  Files           3         4       +1     
  Lines         266       285      +19     
===========================================
+ Hits          265       285      +20     
+ Misses          1         0       -1     
Impacted Files Coverage Δ
itemadapter/_imports.py 100.00% <100.00%> (ø)
itemadapter/adapter.py 100.00% <100.00%> (ø)
itemadapter/utils.py 100.00% <100.00%> (+1.53%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 817054d...fdf1339. Read the comment docs.

Gallaecio commented 2 years ago

(did not check the performance change, though)

Gallaecio commented 2 years ago

Performance-wise, it is 16.5 times slower than it used to be in 3bc56f2f5cd43a90bbf4f65faee951b2f8b58382 (the commit I happen to have in the main branch of my fork), but 29 times faster than it currently is.

The speed up is amazing, although I wonder whether or not it warrants closing #59, given the performance difference with 3bc56f2f5cd43a90bbf4f65faee951b2f8b58382 (which is probably explained by support for new item types, I did not check what that commit supported and did not support).

from timeit import timeit

from itemadapter import is_item

def a():
    return is_item({})

print(timeit(a, number=1000000))

3bc56f2f5cd43a90bbf4f65faee951b2f8b58382: 0.08102401599990117 f9852195203dbea2262a2486ecf5724520b05ee9: 1.3361113370001476 817054d6f80a4704c84630a97eec337c8dd9cd66: 38.81687346000035

elacuesta commented 2 years ago

That's awesome, thanks!

I just pushed eee23cf1c586f7ded27b49a538ad7cb63223f8a3, which improves performance greatly by removing unnecessary duplicated calls to get Scrapy item classes, that was also being done on each item check.

Two additional things that are improving performance for me:

elacuesta commented 2 years ago

Re fdf1339, seems to me like the extra access to the __class__ attribute slows things down a little