Size estimation is used to decide task size and when to output a block to the object store. Seems that can be as much as 2x off from the actual size.
Versions / Dependencies
3.0dev
Reproduction script
The following script runs a .range and then the same Dataset with an identity map function. In theory, these should produce the same number of output blocks. But currently the latter produces twice as many blocks because of a bug in size estimation.
What happened + What you expected to happen
Size estimation is used to decide task size and when to output a block to the object store. Seems that can be as much as 2x off from the actual size.
Versions / Dependencies
3.0dev
Reproduction script
The following script runs a .range and then the same Dataset with an identity map function. In theory, these should produce the same number of output blocks. But currently the latter produces twice as many blocks because of a bug in size estimation.
Issue Severity
None