melowntech / vts-mapproxy

VTS Mapproxy
BSD 2-Clause "Simplified" License
30 stars 5 forks source link

Is it possible to pre-process to jpg format rather than tiff ? #16

Open a180285 opened 3 years ago

a180285 commented 3 years ago

Hi @vaclavblazek I was following the tutorials to add tiff image to vts. https://vts-geospatial.org/tutorials/vtsbackend.html

But I found that, the client actually get JPEG format iamge on-the-fly based on TIFF files.

I got a very bad perfomance for the on-the-fly converting. Is it possible to pre-process to JPEG format to avoid on-the-fly converting? Our target is to improve the client loading performance and loading speed.

vaclavblazek commented 3 years ago

Well, tile imagery is always generated on the fly, regardless what the input format is.

You may analyze your input TIFF dataset (using gdalinfo) and see if it is striped of tiled. Tiled datasets are better for random access, so you may convert input dataset (via gdal_translate) to tiled TIFF (default 256x256 is fine).

The only exception is if there is already existing imagery that exactly fits reference frame's tiling scheme. In this case one may use the tms-raster-remote driver and set it up using URL template for imagery data. This is also the reason why our melown2015 reference frame uses webmercator in its main tiling tree: thus you may directly use external 2D mapping data (like Goole Maps and other services). Also, the tms-bing driver does the same thing but adds the auth layer to obtain proper imagery data. In both cases, the actual imagery data are fetched directly from an external web site by the VTS browser.

a180285 commented 3 years ago

Our tiff is already tiled.

Sorry for that, I'm not totally got the detail.

Our problem is, If I move fast to a point on earth. It may requires more than 5 seconds for client to load entire screen areas. it seems the on-the-fly converting (tiff to jpg) is very slow. But our CPU should be "AMD Ryzen 9 5950X 16-Core Processor" should be OK. No GPU for the server.

So I don't know how to optimize the client loading speed. In my understand avoid on-the-fly tif-to-jpg converting is one method. Is it possible to

Do you have any idea to speed up?

vaclavblazek commented 3 years ago

Sidenote: GPU is not used by mapproxy, so no need to have it on a server.

Well, the imagery is not only re-encoded, it is warped from source dataset SRS to tile's SRS defined in the reference frame (pseudo/web mercator for majority of worl in melown2015 reference frame). And this is in fact not too fast.

The reverse caching proxy (nginx in the default vts-backend setup is there fo cache tiles for future use.

By default, mapproxy launches a "GDAL" process for every CPU (or "thread" in the case of SMT), so probably there are 32 such processes on your 16-core machine. In this case, memory might be the issue. Each "GDAL" process holds all datasets it opened open indefinitely, until it is killed -- which may happend if total real memory occupied by "GDAL" processes exceeds configured limit (gdal.rssLimit option in the config, defaulting to conservative 4096 MB). You may increase the rssLimit if you have more memory. You may check the log for Killing large GDAL process message occurence.

jrjdavidson commented 3 years ago

Hi guys, I'll jump in because I find this topic really interesting. I think that imagery loading speed is one barrier to this type of technology being used more.

I like the idea of pre-processing the data to fit a "URL template for imagery data" and then using the tms-raster-remote driver, as that means no need for pre-processing. Plus, extra storage space on the cloud is a lot cheaper than gruntier servers? Would it be possible to have a flag on mapproxy-setup-resource (e.g. --urlTemplate?) that automatically tiles imagery in a way that it could be used with the raster-remote drive?

a180285 commented 3 years ago

Hi @vaclavblazek , After change gdal.rssLimit, it seems speed a little up. Just like @jrjdavidson said. I'd also like to use storage to exchange loading speed. As like the tutorial, We have already use much time to prepare the data. I think it's worth for spending a little more time on pre-process rather than doing on-the-fly (which will slow down the loading speed.)

vaclavblazek commented 3 years ago

Well, mapproxy was made exactly to prevent static data generation. One thing you might want to do is to preseed the web cache with requests to tiles.

We never had and do not have any intentions to add static tile generation. Because:

Just simple whole-world covering dataset in melown2015 generated to depth of LOD 21 (20 LODS of data) has tree tile tree containing ca 4.4 trillion of tiles -- majority of which no one is ever to look at. If you are happy with 10 LODS -- ca 4.2 million.

So, these are numbers that were take into account when we designed mapproxy.

Also, the code that generates tiles is a complex machinery and not easilly usable outside of actual mapproxy server.

a180285 commented 3 years ago

In my understanding, the command like following

$ generatevrtwo --input Mars_Viking_MDIM21_ClrMosaic_global_232m.tif --output Mars_Viking_MDIM21_ClrMosaic_global_232m.average --resampling average --wrapx 0 --co PREDICTOR=2 --co ZLEVEL=9 --tileSize 4096x4096

If I have a lod20 global TIFF. the above command will gen 1-19 LODs.

So I mean is it possible for the above command to do the pre-encoding while gen 1-19 lods. As TIFF also can use JPEG encode.

So every client load request is just read a TIFF 256x256 tile with JPEG encoded data on disk.

I'm wondering is it possible?

vaclavblazek commented 3 years ago

This command does not generate LODs. It only adds overviews to raster dataset. Nothing else.

jrjdavidson commented 3 years ago

Just simple whole-world covering dataset in melown2015 generated to depth of LOD 21 (20 LODS of data) has tree tile tree containing ca 4.4 trillion of tiles -- majority of which no one is ever to look at. If you are happy with 10 LODS -- ca 4.2 million.

Yup, it makes sense to me now why you would not pre-process a world dataset! But it might be worthwhile if you had a spatially restricted dataset?

In a sense, if you set the cache to never expire (surely that's possible if it's large enough?), pre-seeding the cache achieves the exact same result as the idea of pre-processing. You would need some sort of strategy to make sure you cover all required tiles. Would it be useful to have a command where tiles are pre-seeded to certain LOD in a specified area? Probably too much detail for the gains to be had..

vaclavblazek commented 3 years ago

Yes, large web cache (with long caching times, i.e. >1 year) is a way to go. Preseeding tool might be simple, just URL generator + wget. Or you can browse the web to preseed interesting parts of the dataset. In every dataset there are places no one will ever look at and pre-generating them would be time and space waste. *

NB: You have to use proper tile URL, including the resource "revision" (r=n in the URL template), which changes every time a data-changing change is done to the particual resource.

Pre-generating data using code from mapproxy would take (almost) the same time it takes when browsing, all warping and cutting is performed per tile. To get real speedup it would have to be a standalone tool to convert data in larger chunks. However, it's not a task to program in short time.

a180285 commented 3 years ago

OK, Thank you all. We'll come back to this issue after we have more info.