googleapis / java-bigquery

Apache License 2.0
112 stars 121 forks source link

fix(test): Update schema for broken ConnImplBenchmark test #3574

Open o-shevchenko opened 2 weeks ago

o-shevchenko commented 2 weeks ago

I'm trying to use the executeSelect API and faced extremely slow reading. I tried to use ConnImplBenchmark but noticed that the Shema was changed, and the test didn't work.

bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2017

image image

Summary of Changes Added Fields: airport_fee, data_file_year, data_file_month. Removed Fields: dropoff_longitude, dropoff_latitude, pickup_longitude, pickup_latitude.

After fixing the test I can confirm that we have similar speed results for our use cases. Reading 100_000 rows takes ~15-20 seconds, which is extremely slow.

 Running
ROW 100000 Time: 14978 ms
ROW 200000 Time: 16409 ms
ROW 300000 Time: 16966 ms
ROW 400000 Time: 15963 ms
ROW 500000 Time: 17480 ms

I'm not sure if there was any performance degradation recently since I can't find any expected numbers. It's hard to read this benchmark: https://cloud.google.com/blog/topics/developers-practitioners/introducing-executeselect-client-library-method-and-how-use-it/ According to this image, reading of 1_000_000 rows should take ~1sec image

That's what I've got on my machine:

Benchmark                                            (rowLimit)  Mode  Cnt       Score       Error  Units
ConnImplBenchmark.iterateRecordsUsingReadAPI             500000  avgt    3   76549.893 ± 14496.839  ms/op
ConnImplBenchmark.iterateRecordsUsingReadAPI            1000000  avgt    3  154957.127 ± 25916.110  ms/op
ConnImplBenchmark.iterateRecordsWithBigQuery_Query       500000  avgt    3   82508.807 ± 17930.275  ms/op
ConnImplBenchmark.iterateRecordsWithBigQuery_Query      1000000  avgt    3  165717.219 ± 86960.648  ms/op
ConnImplBenchmark.iterateRecordsWithoutUsingReadAPI      500000  avgt    3   84504.175 ± 36823.590  ms/op
ConnImplBenchmark.iterateRecordsWithoutUsingReadAPI     1000000  avgt    3  165142.367 ± 99899.991  ms/op

I've opened an issue: https://github.com/googleapis/java-bigquerystorage/issues/2764

google-cla[bot] commented 2 weeks ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

o-shevchenko commented 1 week ago

@alvarowolfx Could you please help with the review and performance evaluation? Thanks!

o-shevchenko commented 3 days ago

@alvarowolfx, did you have a chance to look into it?

alvarowolfx commented 3 days ago

@PhongChuong can you take a look on this one ?