Closed vbanos closed 5 years ago
I like the simplicity of the change, but really this function is a very cheap calculation. On my laptop it takes on the order of 1 microsecond. The cache management incurred by using lru_cache
could easily outweigh the improvement.
>>> timeit.timeit('''hash.hexdigest().encode('ascii')''', number=1000000, globals=globals())
0.7279171659999975
I didn't time that function. I just presumed caching would benefit us. Thank you for taking the time to check this.
We use
warcprox.digest_str
in 2 places during the course of a single HTTP request. 1) In allwarcprox.dedup
methods to get the key and lookup for duplicates, 2) inwarcprox.warc
to produce the WARC record.We use
lru_cache
to avoid recalculating it.We also reuse the cached result if the request for the same URL is done again.