onthegomap / planetiler

Flexible tool to build planet-scale vector tilesets from OpenStreetMap data fast
Apache License 2.0
1.46k stars 116 forks source link

[BUG] Parsing Exception for FIXED_LEN_BYTE_ARRAY Data in Parquet File #1107

Open CrazyBug-11 opened 4 days ago

CrazyBug-11 commented 4 days ago

In my dataset, there is a shape_area field defined as follows: image

During parsing, I found that the data values become excessively large. For example: The original value 173.24927660400 is parsed as 1.73249276604E24.

After investigating the code, I found an issue in the ParquetPrimitiveConverter class on line 102, where the scale is negated:

int scale = -decimal.getScale(); When I modified the code to use int scale = decimal.getScale();, the parsed data values were correct.

I would like to understand if there is any specific reason for negating the scale (-decimal.getScale())? Does it serve any special purpose, or is it a mistake?

msbarry commented 18 hours ago
int scale = -decimal.getScale();

Comes from this section of the spec:

https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal

It might be reversed though, would be good to confirm how that field gets interpreted by another tool to be sure.