pangeo-data / storage-benchmarks

testing performance of different storage layers
Apache License 2.0
12 stars 1 forks source link

IORaw updates & test config yaml #18

Closed jreadey closed 6 years ago

jreadey commented 6 years ago

I've updated the IO_raw for tests with target_hdf5 & target_hsds backends. Creates a test.conf.yaml to store configuration related to the actual test cases.

The test.conf.yaml can be used to turn on/off individual tests. Here's an example run with the hs_username/hs_password set in the yaml:

[  0.00%] ·· Building for conda-py3.6-dask-distributed-gcsfs-h5netcdf-h5py-netcdf4-numpy-pip+h5pyd-pyyaml-rasterio-xarray-zarr.
[  0.00%] ·· Benchmarking conda-py3.6-dask-distributed-gcsfs-h5netcdf-h5py-netcdf4-numpy-pip+h5pyd-pyyaml-rasterio-xarray-zarr
[  7.14%] ··· Running IO_dask.ComputeSum_h5netcdf_POSIX_local.time_sum                                                                                              85.0±0.7ms;...
[ 14.29%] ··· Running IO_raw.IORead_h5netcdf_HSDS.time_fancycalculation                                                                                                      102ns
[ 21.43%] ··· Running IO_raw.IORead_h5netcdf_HSDS.time_readtest                                                                                                              643ms
[ 28.57%] ··· Running IO_raw.IORead_h5netcdf_POSIX_local.time_fancycalculation                                                                                             104±1ns
[ 35.71%] ··· Running IO_raw.IORead_h5netcdf_POSIX_local.time_readtest                                                                                                  45.6±0.7ms
[ 42.86%] ··· Running IO_raw.IOWrite_h5netcdf_HSDS.time_writetest                                                                                                            591ms
[ 50.00%] ··· Running IO_raw.IOWrite_h5netcdf_POSIX_local.time_writetest                                                                                                  87.3±2ms
[ 57.14%] ··· Running IO_xarray.ComputeZarrGCS.time_computemean                                                                                                             failed
[ 64.29%] ··· Running IO_xarray.ComputeZarrPOSIXLocal.time_computemean                                                                                                      6.50ms
[ 71.43%] ··· Running IO_xarray.IOReadZarrGCS.time_SyntheticRead                                                                                                            failed
[ 78.57%] ··· Running IO_xarray.IOReadZarrGCS_FUSE.time_SyntheticRead                                                                                                        207ms
[ 85.71%] ··· Running IO_xarray.IOReadZarrPOSIXLocal.time_SyntheticRead                                                                                                      196ms
[ 92.86%] ··· Running IO_xarray.IOWriteZarrGCS.time_SyntheticWrite                                                                                                          failed
[100.00%] ··· Running IO_xarray.IOWriteZarrPOSIXLocal.time_SyntheticWrite

And a run with the default null for username and password:

[  0.00%] ·· Building for conda-py3.6-dask-distributed-gcsfs-h5netcdf-h5py-netcdf4-numpy-pip+h5pyd-pyyaml-rasterio-xarray-zarr.
[  0.00%] ·· Benchmarking conda-py3.6-dask-distributed-gcsfs-h5netcdf-h5py-netcdf4-numpy-pip+h5pyd-pyyaml-rasterio-xarray-zarr
[  7.14%] ··· Running IO_dask.ComputeSum_h5netcdf_POSIX_local.time_sum                                                                                              87.7±0.5ms;...
[ 14.29%] ··· Running IO_raw.IORead_h5netcdf_HSDS.time_fancycalculation                                                                                                        n/a
[ 21.43%] ··· Running IO_raw.IORead_h5netcdf_HSDS.time_readtest                                                                                                                n/a
[ 28.57%] ··· Running IO_raw.IORead_h5netcdf_POSIX_local.time_fancycalculation                                                                                            96.3±1ns
[ 35.71%] ··· Running IO_raw.IORead_h5netcdf_POSIX_local.time_readtest                                                                                                  45.7±0.9ms
[ 42.86%] ··· Running IO_raw.IOWrite_h5netcdf_HSDS.time_writetest                                                                                                              n/a
[ 50.00%] ··· Running IO_raw.IOWrite_h5netcdf_POSIX_local.time_writetest                                                                                                86.3±0.9ms
[ 57.14%] ··· Running IO_xarray.ComputeZarrGCS.time_computemean                                                                                                             failed
[ 64.29%] ··· Running IO_xarray.ComputeZarrPOSIXLocal.time_computemean                                                                                                      6.49ms
[ 71.43%] ··· Running IO_xarray.IOReadZarrGCS.time_SyntheticRead                                                                                                            failed
[ 78.57%] ··· Running IO_xarray.IOReadZarrGCS_FUSE.time_SyntheticRead                                                                                                        212ms
[ 85.71%] ··· Running IO_xarray.IOReadZarrPOSIXLocal.time_SyntheticRead                                                                                                      205ms
[ 92.86%] ··· Running IO_xarray.IOWriteZarrGCS.time_SyntheticWrite                                                                                                          failed
[100.00%] ··· Running IO_xarray.IOWriteZarrPOSIXLocal.time_SyntheticWrite                                                                                                    192ms

Note that the hsds tests now show "n/a" for results.

I'd suggest updating the zarr benchmarks similarly. Put "GCS_Bucket" and such in a config, so that it can be easily disabled when running in environments without access to GCS.