Ideas to speed up 650k chunk world render time

toomuchpete commented 13 years ago

After an influx of new players, our most recent attempt to generate a map for the now 650,000 chunk world is not really going very well.

Prior to this, my procedure has been to spawn a 4XL amazon EC2 box (~24GHz on 8 cores) and let it churn away for a couple of hours. Excluding data transfer back and forth, that was pretty speedy.

This time and last time, the map was rendered without cache (last time because I was swapping image formats, this time because of the terrain.png changes), but despite an only 50% increase in world size, the entire process went from 8ish hours last time to the initial chunk render taking over 24 hours (I stopped it at around 22 hours at around 550,000 chunks rendered).

I'm not sure what happened (maybe the grass coloring code?) but this has gotten pretty slow for us, and I'm looking for ideas to speed it up (shrinking the world is not an option).

I'm interested in hearing sysadmin type ideas (parallelization?) as well as ideas for changes to the code (8x8x8 blocks instead of 16x16x16?).

One thought I had was setting a flag that would only update the tile cache. Between that flag and being able to tell it exactly which chunks needed to be updated, I could probably get most of the hard work done during nightly downtime, leaving only the tile generation phase to run bi-weekly.

eminence commented 13 years ago

a 650,000 chunk world is unusually large. large enough, i would think, that most of us don't have very clear ideas where bottlenecks lie. but you're not alone, since i'm pretty sure other people are running 0.5+m chunk worlds.

650,000 chunks contain about 21 Billion individual blocks that need to be rendered (or inspected to see if they are occluded). that's a lot :)

the grass coloring code shouldn't make any difference. it's true that it needs to be tinted, but that should be done only once when overviewer starts up.

anyway, i think multi-node rendering will be the ultimate solution for worlds this large. i don't see any way for a single node to be able to render everything in a reasonable amount of time. unfortunately, there really isn't any production level code to do this.

I've got a branch that does multi-node rendering with gearman and I believe that emjay1988 has some older multi-node code as well. speaking only about my code -- it needs more testing and benchmarking (and then more coding). if you want to play with this, i'd be glad to help you, but right now who knows if it'd help.

anyway, is your current problem keeping up with the new, rapid growth? or just re-rendering the map to get caught up to the point where incremental renders are [reasonably] quick?

if you need useful results for your players, you could try using the chunklist option and only feed it chunks that have been modified in the past few days. After that's complete, you could start re-rendering the rest of the older parts of the map.

mrsheen commented 13 years ago

I've been chewing on an idea about a changing the application's model, which will make a big difference for maps over 500k. Core premise is to make it a continuous process, which maintains a queue of chunks to render.

-- Warning: text wall below --

I've found incremental rendering of anything under ~3000 chunks is completed in near-realtime when running on hardware equivalent of a 500k+ map (ie, sufficient CPU, disk IO and memory). My idea of a queue is to continually pump a list of updated chunks to a single instance of the application. We force a save-all every fifteen minutes, and will use the rsync output as the list of updated chunks.

The application can sort the queue by any number of metrics, my first thought is to keep a 'priority' queue of often-used chunks. So if a chunk appears in the standard queue more than say 4 times (ie, is being updated every fifteen minutes for an hour), set it as a high priority. The app continuously takes sets of 1000 chunks from the high priority queue, and renders them (rendering surrounding chunks as needed for occlusion). When a priority chunk is rendered, a flag is set for that chunk so it can't be prioritzed again for a full threshold cycle (ie 4 more times). When the priority queue is emptied, it starts grabbing chunks from the standard queue. When the standard queue is emptied (ie, the render process catches up to all live changes), it start picking chunks from the world directory that have not appeared in the queues.

Couple of notes: -- process does not need to actually be run continuously, but can save state to disk and pick up where it left off, using flat files for queue input, just appending rsync output. This allows use of EC2 instances for a couple hours every day, churn through the priority chunks (which are chunks that are most active in the world, therefore more likely to be wanted rendered often), and get through as many of the standard queue and world as possible.

-- using some basic status metrics, you could see render rates, and how fast it is getting through the priority, updated, and world queues. From this, you can tweak the thresholds.

One other situation that I'm keeping in mind: if you have many active players creating large structures in seperate parts of the map, you may end up with a queue which grows faster than you can render it, meaning the rest of the map never gets rendered. I think a good mitigation would be to purge and disable the standard and priority queues for a set period each day (ie, the server low-time from say 8am-10am for us, or for the last hour of the instance), which lets the process take a big chunk of the world and render it. Keep the list of chunks sorted by last render time, and over time the entire world will be rendered.

Final result is the ability to start rendering a map on an existing large world without the 24hour+ overhead of an initial render (times three if you want lighting branches like us :) ). Most active building spots get rendered as often as you save (or as often as you fire up the instance), and the rest of the map is slowly filled in as it can get to it. Might be days, weeks, or months, but the 'interesting' bits of your map are immediately accessible.

-- Awesome thought, replace un-rendered tiles with placeholder yellow/black 40% opacity 'under construction' images, very little overhead (could render in a couple minutes, perhaps?).

I'd love feedback on this idea, before I get started. It's not suited to most deployments, so I don't really see it being merged into brownan's package. But I think if the core image processing code can be kept in line with any upstream changes, we can make a workable package suited for large maps.

toomuchpete commented 13 years ago

I agree that multi-node is probably the way to go, although for my specific case it's not particularly helpful. Since I only run the map once every two weeks, i'm okay with it taking a few days to render, the problem is that the box I typically run it on is fairly expensive (EC2, 4XL instance) so spinning up more of them makes the render faster but doesn't make it any cheaper.

Rendering smaller files might not save us much time on the chunk render phase, but it could significantly reduce the render time for tiles. Moving from 16x16x16 to 4x4x4 would, I think, produce a noticeable improvement of that phase. This would be straight-forward enough to code -- just check the dimensions of terrain.png. Hell, that would also allow for 1x1x1 block maps.

I'm interested in the idea of the constant tile render, although I have yet to find an rsync command which doesn't think every map chunk has been updated every time I run it, even within a few hours of the previous run -- and it's physically impossible for every one of those chunks to have been touched in that amount of time, but that might just be a flag I'm missing or something.

eminence commented 13 years ago

are you using the --times option to rsync (preserve times)? i almost always use the --archive option (which includes --times, check the man page).

toomuchpete commented 13 years ago

Ah, I see. It wasn't actually copying the files over again, but it was touching the destination file. Using -a appears to solve that problem. Thanks!

erjiang commented 13 years ago

Do we have any profiling data on why exactly 0.5million+-chunk worlds are so slow?

toomuchpete commented 13 years ago

I guess I just have no idea how to use the chunklist option. No matter what file I pass to it, even just a file with one line in it, it appears to try to scan every single chunk.

rbos commented 13 years ago

chunklist is very very slow. :/

I just want to recalculate a few hundred chunks, but it takes a half hour.

eminence commented 13 years ago

closing this issue now that we've pushed the dtt-c-render code. you should expect a render of 650k chunks to drop to around 4 hours (probably less if you've got a beefy machine)

overviewer / Minecraft-Overviewer

Ideas to speed up 650k chunk world render time #161