overviewer / Minecraft-Overviewer

Render high-resolution maps of a Minecraft world with a Leaflet powered interface
https://overviewer.org/
GNU General Public License v3.0
3.35k stars 480 forks source link

compress the map by reusing tiles #648

Open fab1an opened 12 years ago

fab1an commented 12 years ago

Hi,

i wrote a script that does SHA-sums for every tile and creates a web-server readable symlink from this SHA to (one of) the actual tiles. Then i output a json-file which is used by overviewer to look up the shasum for a tile and loads this instead of the normal tile.

It reduced the size of my map by 16% from about 900MB to 810MB.

I ask myself if it's worth doing something like this. It needs a few minutes to run.

My questions:

agrif commented 12 years ago
  1. This shouldn't interfere with the cachetag. If you're loading your JSON file dynamically, you might want to stick the cachetag on the end of that too? I'm not exactly sure.
  2. JPEG images are approximations, PNG is exact. It makes sense that there would be slightly more duplicates with JPEG.

How big is the json file? For large maps this file would become very, very large and even indexing it might be expensive. Apparently though for small-to-medium maps it's an interesting space-saver.

fab1an commented 12 years ago

The json file is 3MB currently (for a 900MB map consisting of night, light, nether, caves).

I can think of various ways to compress it though. removing data like the jpg-ending. gzip-compression. storing a trie instead of a "file" -> "sha" map.

fab1an commented 12 years ago

Where is the cacheTag documented? I did some google-searches but couldn't find anything. Is that a overviewer thing or a gmaps thing?

agrif commented 12 years ago

It's just Overviewer, and it's not documented because it's not really a user-facing feature. It's used to overcome some really aggressive tile caching some browsers do, so you can see map updates faster.

With a 3MB file and 10% space saving, an average web site visitor would need to download more than 30MB of map data before this method becomes more bandwidth-efficient. This seems reasonable, but it scales up with map size. I'll be interested to see how small the file can get.

I'm still pretty surprised this method works so well. Are there any types of tiles that seem to be duplicated more often?

fab1an commented 12 years ago

I haven't implemented statistics about the tiles yet. Which information can i read out of the filesystem except out of which render it comes?

My guess is mainly ocean is compressed.

I use smooth lightning btw.

Fenixin commented 12 years ago

I was really interested about this so I did some tests in my map. The day smooth lighting render of my map is about 603MiB (is rendered in png and with optimizeimg=2). I ran fdupes to see how many tiles were duplicated and it turned out that were a lot (1056 tiles in five sets occupying only 1.2MiB). Looking the tiles with an image viewer shows that the sets and their count in tiles are the next:

1042    empty
2       ocean in map edge (the end of the map is right up)
2       chunk corner
8       map edge in up-left to down-right direction (only soft stone present)
7       map edge in down-left to right-up direction (only soft stone present)

Didn't look them all, but the empty tile I checked was really empty, so just an image with the background color in all its pixels.

I think this is a bug and overviewer is rendering a ton of non needed tiles. Looking directly at the empty tiles they are always in top of edge chunks. So it looks like something about rendering the top of the chunks in most upper part of the map (the top edge of the map).

I don't know how the jpg compression works for empty images (png does it exceptionally), but if it doesn't work well, this is maybe why there is such a big reduction of space for jpg maps.

Fenixin commented 12 years ago

Small Update:

Using the last code in master branch and rendering exmaple with default options gives you 89 duplicated files and all of them are empty tiles.

AltyFox commented 12 years ago

I can confirm this on my map. There are very very few 'legitimate' duplicated tiles. Every major duplication are empty PNG files with 100% transparent background. Instead, overviewer shouldn't render empty tiles, otherwise this proposal won't serve any purpose.

acertain commented 12 years ago

Couple more options:

The problem is that the first doesn't save bandwidth

AltyFox commented 12 years ago

As stated though, duplicated tiles are few and far between, so the need to reuse them is not needed. Once the bug is fixed this should all be done with.

tornewuff commented 12 years ago

I'm still getting many blank tiles with the current master (2912 blank tiles out of 22043, on my map).. is this supposed to have been fixed?

AltyFox commented 12 years ago

@tornewuff, if it were I beleive this issue would be closed. As of right now, I think it's still unfixed.

pomtom44 commented 4 years ago

Hi,

i wrote a script that does SHA-sums for every tile and creates a web-server readable symlink from this SHA to (one of) the actual tiles. Then i output a json-file which is used by overviewer to look up the shasum for a tile and loads this instead of the normal tile.

It reduced the size of my map by 16% from about 900MB to 810MB.

I ask myself if it's worth doing something like this. It needs a few minutes to run.

My questions:

  • Does it interfere with the ?cachetag mechanism?
  • for some reasons when i do output to png my mechanism finds far less duplicate tiles (about 1% or so)

Sorry to drag up a old thread, but do you have a copy of that script? Im generating out some really big maps for a timelapse project, and want to try reduce the size as much as possible