mjakubowski84 / parquet4s

Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
https://mjakubowski84.github.io/parquet4s/
MIT License
283 stars 65 forks source link

Incorrect value after reading parquet #338

Closed JstFlip closed 9 months ago

JstFlip commented 9 months ago

I have a parquet that has a column which has decimal type column and looks like this

root
 |-- decimalValue: decimal(12,6) (nullable = true)

+------------+
|decimalValue|
+------------+
|   71.940000|
|   50.000000|
|        null|
+------------+

When I read the parquet using ParquetReader with the specified case class, I get the typed output, but the value doesn't seem to be Decimal anymore, but still has the BigDecimal type.

case class Row(decimalValue: Option[BigDecimal])
val parquetIterable = ParquetReader.as[Row].read(Path("src/main/resources/data.parquet"))

which returns this output and the values are missing "." and are basically a Long type:

Some(71940000)
Some(50000000)
None

Am I just missing something or is this a bug? Thanks

mjakubowski84 commented 9 months ago

Hi @JstFlip! Could you plz attach such a sample file to make the testing easier?

JstFlip commented 9 months ago

Hi @JstFlip! Could you plz attach such a sample file to make the testing easier?

Yes of course. There you go (I was unable to upload parquet directly, had to zip it). example.zip

mjakubowski84 commented 9 months ago

Thanks. Yes, it looks like a bug. I am going to have a look at it.

mjakubowski84 commented 9 months ago

339 should solve your problem. Parquet4s doesn't support decimal values encoded as Long, but only as Binary. This fix changes it. However, please mind that a read decimal won't have exactly the same scale and precision as in a written file. It will be a default DECIMAL(38,18). Still, it will be an equal value, just the representation will be different.

Making it exact would require a bigger change in the library which would go beyond the scope of a bugfix release.

mjakubowski84 commented 9 months ago

Fix is released in 2.15.1

JstFlip commented 9 months ago

Thank you @mjakubowski84

Also I have some other (newbie) questions and I don't want to spam or open another Issue since it won't be some issue. Is there any way to contact you differently?

mjakubowski84 commented 9 months ago

@JstFlip Before asking any questions, please check:

It is quite probable that someone already answered the question you want to ask. Otherwise... You may also start a discussion in the Github project (that might be the first one :) ). I prefer to keep communication public because the outcome might be useful for others.