ucd-cws / ca-naip

Indexes and Additional Information for California's National Agriculture Imagery Program (NAIP)
0 stars 0 forks source link

Seeding a Tile Cache #12

Open wildintellect opened 9 years ago

wildintellect commented 9 years ago

I just ran some calculations to help decide which zooms to preseed vs on the fly tile generation(freq accessed tiles will be cached for later use). This is just for BBOX of California, ~ 16kb per 256x256 tile (PNG or JPG)

Zoom NumTile Size(~GB) 9 200 10 1000 11 4000 0 12 16000 2 13 64000 10 14 256000 40 15 1024000 163 16 4096000 655 17 16384000 2621 18 65536000 10485 19 262144000 41943 20 1048576000 167772 21 4194304000 671088 22 16777216000 2684354 Total 375000000 GB

Zoom level 17 already gets into the 2.6 TB range. At this time there are enough SSDs for zoom 15. Based on this pre-seeding overviews, and then caching when 1st requested for everything else will probably be necessary. This table from OSM is highly informative http://wiki.openstreetmap.org/wiki/Tile_disk_usage

ghost commented 9 years ago

@qjhart @wildintellect By the way, I assume we are planning to base the tiles on the DOQQ TIFFs not the CCM (county compressed mosaics) in MrSID.

wildintellect commented 9 years ago

Yes, geotiffs are easier to work with when using Linux servers + Mapserver/Geoserver. It's also my understanding that the higher the compression the longer the read access time often is. http://linfiniti.com/2011/05/gdal-efficiency-of-various-compression-algorithms/

qjhart commented 9 years ago

@wildintellect, Gail and I discussed some strategies on building this cache yesterday. Gail said your first choice would be epsg:3310 as the projection. I shared with Gail what CERES used previously as a tile cache setup for California, https://docs.google.com/document/d/1rBBbDtUWSh0TGqdMSYsgNiawDC5atN82AMqJ3YQ30Nw/edit?usp=sharing

qjhart commented 9 years ago

@wildintellect , I'm not sure about your numbers, but regardless I like you notion of not pre-seeding everything (based on how fast we can build a tile I suppose) I had a strange thought about that. Could we use a fuse filesystem that built missing files on the fly? Since GDAL is python-savy, it seems like this could be pretty straight forward. http://www.stavros.io/posts/python-fuse-filesystem/

qjhart commented 9 years ago

Been thinking about this. One difference between OSM and an image server is that usually, the larger tiles are made from the smaller ones. As a result, but Level5, you have a good average of values from level0 tiles. If we make tiles on the fly, then the level 5 will at best (without complication) be a cubic convolution of the level 0 files, which will be different, though bad, I'm not sure. Worth testing.

wildintellect commented 9 years ago

What you describe is how tile servers work already. Any tiles that don't already exist are created and cached on first request. The debate was about how much to preseed to improve the user experience.

As for your other comment, I think you are describing Metatiles, also something tile servers do already. http://geowebcache.org/docs/current/concepts/metatiles.html If you are describing resampling of the imagery to produce aggegrate zooms (lower zoom levels), we can control which method is used. This is just about optimizing performance with the hardware involved.

qjhart commented 9 years ago

@wildintellect / @gwatprg from the notes above it sounds like a decision has been made on what tileserver will be used for the NAIP imagery, is that the case?

wildintellect commented 9 years ago

Yes and No. I know what we're going to try 1st, Mapserver+Mapcache

  1. The config files are plain text so we can store them in git
  2. It provides the ability to do WMS, WMTS, and TMS
  3. Load balancing is easily achieved with Apache, Nginx or HA Proxy. So it can scale to as many nodes as we want. I already have an example of this setup on the data2 server, it handles the WMTS and WMS for http://data.biogeo.ucdavis.edu/dhs/map.html Only performance change I want to make is to move from cgi to fcgi/fastcgi and add a load balance with another node.

Tilestache was a close 2nd but can't do WMS or WMTS only TMS, sad because wsgi servers scale really well. Geoserver, can't cluster without enterprise license or serious hacking on java, xml and dbs. ArcGIS Server, if we can run it on Ubuntu or other linux it can access the SSD NFS and the main HDFS storage (does not work on Windows), but it will have the same scaling issue as geoserver, the configs are buried in GUIs, no easy way to scale.

The bigger question is when (not if) to invest in some large SSD drives for the caches. I have 500 GB available now for testing. Long term we have empty slots for more drives and should probably discuss getting 4x1TB SSD for the intial push of NAIP and a few other data sets.