Open tomohiro-n opened 4 months ago
More easily, changing to const bytes = await randomBytes(2_000_000)
is enough for the test to fail.
Hi! I'm not too familiar with your code base, I took a look at this and found a couple things:
TLDR: Problem Actual code and test code create different sized data streams which then become CARs with different IDs.
Fix (PR 2532):
Use UnixFS module's createFileEncoderStream
in test helper and actual code and test code will create same sized data streams resulting in same CAR IDs on large files.
Details:
The actual code calls CarWriter.create
while the test helper calls toCAR
. For small byte sizes, like 128, the expected cid and actual cid are the same as desired. Moreover the number of bytes in the expected CAR instance is the same and the actual CAR instance are the same:
If we use more bytes, the expected CAR instance (2000098) is smaller than the actual one (2000283)
The test helper's toCAR
method does not chunk the bytes, while the actual code's underlying the UnixFS module, with a max chunk size of 1024 * 1024
, splits 2_000_000 into three chunks. So, in the real code data is added to each chunk.
By using the UnixFS module's createFileEncoderStream
method to make a chunked stream before making a CAR object, the same headers get added to each chunk and the same CAR gets created (see PR 2532). The tests then pass at both 128 bytes and 2_000_000 bytes.
Hopefully, this is helpful!
I've noticed that the CID we pre-calculate for a file and one after it's uploaded to your service can be different. Then I was able to reproduce the exact same mismatch(
expected
value was what we pre-calculated,actual
was the one after upload) with one of your test cases.Most likely, it depends on the file size. As far as we've checked, the mismatch is produced when the size is > 1.9mb or so.
The uploads a file to the service test in the
upload-client
package fails by changing as follows.I've confirmed that the test passes with a 400kb file.