Open lgray opened 2 weeks ago
pa.parquet.read_table
is a high-level convenience method that reads everything from the file. We use pa.parquet.ParquetFile
to possibly select columns and row groups. But, after that, we just feed the result to ak.from_arrow
. That's the only difference between the two procedures.
But is the time difference really seen in the pa.parquet.read_table
→ ak.from_arrow
versus ak.from_parquet
itself? These Jupyter cells also include copies to and from the GPU, JIT-compilation of the **2
function, and stuff that might not be the same. In fact, if these two cells are from the same process and they were executed in the order shown above, then **2
gets compiled in the first one and not the second one, which could easily account for a few seconds (especially if it's the first thing to be compiled, as it has to warm up the compilation machinery).
The device is not occupied, but to sate your skepticism:
The **2
was compiled much earlier, FWIW.
Okay. If the speed difference persists after replacing pa.parquet.read_table with pa.parquet.ParquetFile.read_row_groups, then there is something in the Awkward code that's impeding performance, because the Awkward code is supposed to be just pa.parquet.ParquetFile.read_row_groups followed by ak.from_arrow.
A size-able fraction of the time, but not all of it.
Version of Awkward Array
2.6.5
Description and code to reproduce
In benchmarking GPU resources I ran into a curious performance difference in trying to compare CPU based reads with arrow to GPU-DMA reads via cudf.
Is this expected? A factor of two, coming only from reading (all other bits of code are the same) seems like performance left on the floor.