ak.from_parquet slower than pa.parquet.read_table + ak.from_arrow

lgray commented 2 weeks ago

Version of Awkward Array

2.6.5

Description and code to reproduce

In benchmarking GPU resources I ran into a curious performance difference in trying to compare CPU based reads with arrow to GPU-DMA reads via cudf.

Is this expected? A factor of two, coming only from reading (all other bits of code are the same) seems like performance left on the floor.

jpivarski commented 2 weeks ago

pa.parquet.read_table is a high-level convenience method that reads everything from the file. We use pa.parquet.ParquetFile to possibly select columns and row groups. But, after that, we just feed the result to ak.from_arrow. That's the only difference between the two procedures.

But is the time difference really seen in the pa.parquet.read_table → ak.from_arrow versus ak.from_parquet itself? These Jupyter cells also include copies to and from the GPU, JIT-compilation of the **2 function, and stuff that might not be the same. In fact, if these two cells are from the same process and they were executed in the order shown above, then **2 gets compiled in the first one and not the second one, which could easily account for a few seconds (especially if it's the first thing to be compiled, as it has to warm up the compilation machinery).

lgray commented 2 weeks ago

The device is not occupied, but to sate your skepticism:

lgray commented 2 weeks ago

The **2 was compiled much earlier, FWIW.

jpivarski commented 2 weeks ago

Okay. If the speed difference persists after replacing pa.parquet.read_table with pa.parquet.ParquetFile.read_row_groups, then there is something in the Awkward code that's impeding performance, because the Awkward code is supposed to be just pa.parquet.ParquetFile.read_row_groups followed by ak.from_arrow.

lgray commented 2 weeks ago

A size-able fraction of the time, but not all of it.

scikit-hep / awkward

ak.from_parquet slower than pa.parquet.read_table + ak.from_arrow #3151

Version of Awkward Array

Description and code to reproduce