Closed vEpiphyte closed 3 years ago
I see the problem. We decompress server-descriptors-2020-08.tar.xz when first downloaded, then compare the decompressed tarball checksum with the index. Note the two hash values in the exception...
OSError: /tmp/stem_bg60/server-descriptors-2020-08.tar already exists but mismatches CollecTor's checksum (
expected: 5f5c62fa5691d520017ef107c1d6ea4f29af2e5aabf959373da31755c30d21d8,
actual: 352b10fae3e221fb3287d8e1dfd754eb43f3058d94ee8940d090f34971b01f70
)
... and how they compare with the following...
{
"path": "server-descriptors-2020-08.tar.xz",
"size": 228750972,
"last_modified": "2020-09-07 11:59",
"types": ["server-descriptor 1.0"],
"first_published": "2020-08-01 00:00",
"last_published": "2020-08-31 23:59",
"sha256": "X1xi+laR1SABfvEHwdbqTymvLlqr+Vk3PaMXVcMNIdg="
}
>>> index_checksum = 'X1xi+laR1SABfvEHwdbqTymvLlqr+Vk3PaMXVcMNIdg='
>>> binascii.hexlify(base64.b64decode(index_checksum)).decode('utf-8')
'5f5c62fa5691d520017ef107c1d6ea4f29af2e5aabf959373da31755c30d21d8'
>>> with open('/home/atagar/Desktop/server-descriptors-2020-08.tar.xz', 'rb') as collector_file:
... hashlib.sha256(collector_file.read()).hexdigest()
...
'5f5c62fa5691d520017ef107c1d6ea4f29af2e5aabf959373da31755c30d21d8'
>>> with open('/home/atagar/Desktop/server-descriptors-2020-08.tar', 'rb') as collector_file:
... hashlib.sha256(collector_file.read()).hexdigest()
...
'352b10fae3e221fb3287d8e1dfd754eb43f3058d94ee8940d090f34971b01f70'
We can fix this in a couple ways...
Cache the compressed file. This will retain our integrity check and reduce disk usage, but greatly increase the time it takes to read cached files.
Simply skip the integrity check if the cached file has been decompressed.
I'm leaning toward the later because a sluggish cache is rather unhelpful.
Thanks for catching this! Would you care to fix this or shall I?
Unfortunately, I'm not familiar enough with the internals of STEM to fix this in a timely fashion. If you've got the time to fix it, that would be great! I don't have a strong preference about the two options presented. Everyone likes a fast cache though, which is what makes them useful :)
In the end I decided to opt for the former (cache compressed files). Fix pushed...
This is a independent repro of a problem related to #60 but was not tested previously.
Using the current master, when caching descriptor files, CollecTor fails to validate the cached files:
Here is my repro code
This fails as soon as I try to use the cached data with the following errors:
Running it again fails right away
And environment information (using ubuntu 18.04)