Closed gfinol closed 10 months ago
Note:
When reading data using obj.data_stream.read()
, each function reads the correct amount of bytes.
The problem only occurs when reading chunks from the data_stream
.
I understand the bug appears when you use a custom chunk_size
, right?
Yes, that's rigth.
Custom chunk_size
from the obj.data_stream
created by the data partitioner.
@gfinol I added a potential fix for this in #1215 , can you try with master branch?
@JosepSampe Looks like #1215 fixed this. I close the issue.
Thank you very much!
When using the data partitioner
obj_chunk_numer
orobj_chunk_size
parameters and reading data from theobj.data_stream
I found that the total read data is bigger than the size of the read file.Here is a code snipped to reproduce the bug. I tested it with a file of 5,000,000 bytes, each line of the file is 100 bytes long.
The bug is also present when using
aws_lambda
andaws_s3
as backends.I've used different numbers for both
obj_chunk_numer
orobj_chunk_size
. The current behavior is that the first N-1 mapper functions readchunk_size
bytes extra. The last function reads the correct number of bytes.According to lithops documentation, each
obj_chunk_size
must be at least 1 MIB. This condition is fulfilled in this case. Is there any other requirement that I may be missing?