Closed elacuesta closed 2 years ago
Merging #60 (fdf1339) into master (817054d) will increase coverage by
0.37%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #60 +/- ##
===========================================
+ Coverage 99.62% 100.00% +0.37%
===========================================
Files 3 4 +1
Lines 266 285 +19
===========================================
+ Hits 265 285 +20
+ Misses 1 0 -1
Impacted Files | Coverage Δ | |
---|---|---|
itemadapter/_imports.py | 100.00% <100.00%> (ø) |
|
itemadapter/adapter.py | 100.00% <100.00%> (ø) |
|
itemadapter/utils.py | 100.00% <100.00%> (+1.53%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 817054d...fdf1339. Read the comment docs.
(did not check the performance change, though)
Performance-wise, it is 16.5 times slower than it used to be in 3bc56f2f5cd43a90bbf4f65faee951b2f8b58382 (the commit I happen to have in the main branch of my fork), but 29 times faster than it currently is.
The speed up is amazing, although I wonder whether or not it warrants closing #59, given the performance difference with 3bc56f2f5cd43a90bbf4f65faee951b2f8b58382 (which is probably explained by support for new item types, I did not check what that commit supported and did not support).
from timeit import timeit
from itemadapter import is_item
def a():
return is_item({})
print(timeit(a, number=1000000))
3bc56f2f5cd43a90bbf4f65faee951b2f8b58382: 0.08102401599990117 f9852195203dbea2262a2486ecf5724520b05ee9: 1.3361113370001476 817054d6f80a4704c84630a97eec337c8dd9cd66: 38.81687346000035
That's awesome, thanks!
I just pushed eee23cf1c586f7ded27b49a538ad7cb63223f8a3, which improves performance greatly by removing unnecessary duplicated calls to get Scrapy item classes, that was also being done on each item check.
Two additional things that are improving performance for me:
ItemAdapter.is_item
instead of itemadapter.is_item
as the former actually imports the latter upon being calleddict
first, so in this particular case it works faster if you put the dict adapter first: ItemAdapter.ADAPTER_CLASSES = deque([DictAdapter, ScrapyItemAdapter])
. I probably changed that to optimize for Scrapy items over dicts, thinking that they were used more often; we could of course revisit that decision.Re fdf1339, seems to me like the extra access to the __class__
attribute slows things down a little
Related to #59 (not sure if it fixes it completely)
Tasks:
sys.modules
if they were imported already)itemadapter.adapter