Closed egorsmth closed 6 months ago
I am not sure what could be the reason of iterator + slice getting slower with time, especially that I do not know the rest of your code. Maybe you are loading the whole file into memory.
The second option can be quite slow in general, because you are opening a file each time.
In order to avoid memory issues and keep the high performance I recommend using a reactive solution that Parquet4S supports that is Akka, Pekko & FS2.
Yeap, I guess I have some problem with whole file loading each time. I will try fs2 thanks.
I need to read files in a paginated way. I tried 2 options:
1) parquetReader.iterator.slice(limit, limit + offset) 2) RecordFilter(index => index >= offset && index < offset + limit)
First option pretty fast in the beginning of the file and slows down when we move to end of file. Totally It is rather slow in my case. Second option reads each "page" in a consistent time, but each read rather slow compared with reads of first option in the beginning of file.
What is the right way to read big files?