Closed tyoras closed 1 year ago
Thank you for reporting an issue. We can add this particular use case to tests if it doesn't exist yet.
However, what troubles me is IncrementalParquetWriter
... Which version of Parquet4s are you using? We do not have anything like IncrementalParquetWriter
here, at least for a couple of years now.
We are on version 2.13.0
and indeed IncrementalParquetWriter
there is not part of Parquet4s but a utility that we implemented, it is only used to create the parquet file that needs to be read in the example. Sorry for the confusion.
I've simplified the example and removed our custom IncrementalParquetWriter
:
import com.github.mjakubowski84.parquet4s.*
import org.apache.parquet.hadoop.ParquetFileWriter
case class Foo(id: Int, bar: Bar)
case class Bar(a: String, b: String, c: String)
case class Baz(a: String, c: String)
val foos = List(
Foo(1, Bar("a1", "b1", "c1")),
Foo(2, Bar("a2", "b2", "c2"))
)
val path = Path("test.parquet")
val options = ParquetWriter.Options(
writeMode = ParquetFileWriter.Mode.OVERWRITE
)
def readBaz(path: Path): Iterable[Baz] =
ParquetReader
.projectedGeneric(
Col("id").as[Int].alias("id"),
Col("bar.a").as[String].alias("a"),
Col("bar.c").as[String].alias("c")
)
.read(path)
.map(_.as[Baz](ValueCodecConfiguration.Default))
@main
def main(): Unit =
ParquetWriter.of[Foo].options(options).writeAndClose(path, foos)
readBaz(path).foreach(println)
Thanks! All is clear now. You've spotted a nasty bug.
The culprit is a mutation in end()
function of RootRowParquetRecordConverter.
It should be quite easy to fix it. However, I won't be able to have a look at it before the end of this week.
Thanks a lot to you for your great reactivity 👍
Hello, reading a parquet file with a group is failing when we try to select more than one column from the same group using
ParquetReader.projectedGeneric
. In this example, we try to only read theid
,bar.a
andbar.c
fields:And it fails to read with this error: