pangeo-data / cog-best-practices

Best practices with cloud-optimized-geotiffs (COGs)
BSD 3-Clause "New" or "Revised" License
83 stars 9 forks source link

benchmarking read and write performance #10

Open scottyhq opened 3 years ago

scottyhq commented 3 years ago

@rmg55 put together a nice notebook here looking at s3 versus http access: https://github.com/rmg55/CloudDAAC_Binders , would be great to include more benchmarking information and suggestion in this repository

also wanted to link over to https://github.com/pangeo-data/pangeo-integration-tests/issues/1

vincentsarago commented 3 years ago

@scottyhq I haven't run your notebook but couple weeks ago we had a discussion about the difference between s3/https URL in the cogeotiff slack. We found out that using S3 url was saving one request (GetFileSize)

$ tilebench profile s3://rio-tiler-dev/data/eu_webAligned_256pxWEBP.tif --tile 5-10-9| jq                                  
{
  "LIST": {
    "count": 0
  },
  "HEAD": {
    "count": 0
  },
  "GET": {
    "count": 2,
    "bytes": 32768,
    "ranges": [
      "0-16383",
      "229376-245759"
    ]
  },
  "Timing": 0.4968528747558594
}
$ tilebench profile https://rio-tiler-dev.s3.amazonaws.com/data/eu_webAligned_256pxWEBP.tif --tile 5-10-9 | jq             
{
  "LIST": {
    "count": 0
  },
  "HEAD": {
    "count": 1
  },
  "GET": {
    "count": 2,
    "bytes": 32768,
    "ranges": [
      "0-16383",
      "229376-245759"
    ]
  },
  "Timing": 0.4858889579772949
}

ahhhhh, so GetFileSize is used on http sources, if you use s3 GDAL get the filesize from the response of the first GET request

ref: https://cogeotiff.slack.com/archives/C01DE57GLHE/p1613141057016300