scrapy / itemadapter

Common interface for data container classes

BSD 3-Clause "New" or "Revised" License

61 stars 13 forks source link

Avoid imports inside functions #60

Closed elacuesta closed 2 years ago

elacuesta commented 2 years ago

Related to #59 (not sure if it fixes it completely)

Tasks:

[x] Restore test coverage (the solution I found is a bit hacky: blocking some imports, also deleting them from sys.modules if they were imported already)
[x] Remove additional non top-level imports in itemadapter.adapter

codecov[bot] commented 2 years ago

Codecov Report

Merging #60 (fdf1339) into master (817054d) will increase coverage by 0.37%. The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           master       #60      +/-   ##
===========================================
+ Coverage   99.62%   100.00%   +0.37%     
===========================================
  Files           3         4       +1     
  Lines         266       285      +19     
===========================================
+ Hits          265       285      +20     
+ Misses          1         0       -1

Impacted Files	Coverage Δ
itemadapter/_imports.py	`100.00% <100.00%> (ø)`
itemadapter/adapter.py	`100.00% <100.00%> (ø)`
itemadapter/utils.py	`100.00% <100.00%> (+1.53%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 817054d...fdf1339. Read the comment docs.

Gallaecio commented 2 years ago

(did not check the performance change, though)

Gallaecio commented 2 years ago

Performance-wise, it is 16.5 times slower than it used to be in 3bc56f2f5cd43a90bbf4f65faee951b2f8b58382 (the commit I happen to have in the main branch of my fork), but 29 times faster than it currently is.

The speed up is amazing, although I wonder whether or not it warrants closing #59, given the performance difference with 3bc56f2f5cd43a90bbf4f65faee951b2f8b58382 (which is probably explained by support for new item types, I did not check what that commit supported and did not support).

from timeit import timeit

from itemadapter import is_item

def a():
    return is_item({})

print(timeit(a, number=1000000))

3bc56f2f5cd43a90bbf4f65faee951b2f8b58382: 0.08102401599990117 f9852195203dbea2262a2486ecf5724520b05ee9: 1.3361113370001476 817054d6f80a4704c84630a97eec337c8dd9cd66: 38.81687346000035

elacuesta commented 2 years ago

That's awesome, thanks!

I just pushed eee23cf1c586f7ded27b49a538ad7cb63223f8a3, which improves performance greatly by removing unnecessary duplicated calls to get Scrapy item classes, that was also being done on each item check.

Two additional things that are improving performance for me:

Use ItemAdapter.is_item instead of itemadapter.is_item as the former actually imports the latter upon being called
One thing I found when comparing master with the commit you shared is that we used to check for dict first, so in this particular case it works faster if you put the dict adapter first: ItemAdapter.ADAPTER_CLASSES = deque([DictAdapter, ScrapyItemAdapter]). I probably changed that to optimize for Scrapy items over dicts, thinking that they were used more often; we could of course revisit that decision.

elacuesta commented 2 years ago

Re fdf1339, seems to me like the extra access to the __class__ attribute slows things down a little