Meshmoon - Async HttpAssetProvider cache disk write

jonnenauha commented 11 years ago

I cooked this up on my summer holiday for fun, have been thinking for a long time that our cache IO should be made asynchronous. I've actually started this "project" many times, but always after one night of coding abandoned the idea because it would change too much of the AssetAPI internals.

Cache writes can be performed by the providers, this is the case with HttpAssetProvider. This is where I started my work as it was the lowest hanging fruit. Modifying cache writes out of the main thread in AssetAPI would not be too hard to implement, just another operation and waiting for a signal to continue reporting that the transfer has completed.
Cache reads are performed by IAsset or if overridden in asset implementations. Texture and Mesh assets already skip reading from disk if async loading is allowed in LoadFromFile that AssetAPI calls if the transfers raw data is null. This will happen if HttpAssetProvider gets a 304 Not Modified response, disk source is set but the raw data is not filled.
- This mechanism actually works well for Texture and Mesh (lets Ogre handle an async disk load) but all others asset types go to IAsset::LoadFromFile that will block the main thread. Making IAsset::LoadFromFile async would require bigger structural changes to AssetAPI: Should return a DiskLoadOperation or something that would have a signal for completion. Currently many places both internally in AssetAPI and outside code expect that the asset is in fact loaded when LoadFromFile returns, and they continue to use the asset ptr. Changing the function signature would easily allow us to fix all the places that expect this result and port them, but it would be a breaking change in the API.

Granted nowadays hard drives are very fast, SSDs are getting very common etc. I still believe we should aim to move disk I/O out of the main thread. Certain assets can be very big like .zip bundles (note: this pull request fixed the write and ZipAssetBundle internally threads the disk read), big textures, big mesh files, big audio files (I've seen 90mb wav files in scenes) etc. Doing this I/O in the main thread during the login phase is pretty much ok, but we will soon be moving to asset interest systems and scripts that request/load assets during runtime and then it will affect the user experience as FPS drops.

I'd love to hear from everyone how this code looks to start with and if you have any opinions how we could thread more asset implementation load operation and the generic IAsset::LoadFromFile.

Cheers!

jonnenauha commented 11 years ago

Forgot to mention I did run a bit of profiling and noticed that when the cache folder was "cold" i was seeing up to 40-100 msec spent on the disk write alone. If you ran Tundra multiple times in a row it seemed to get "hot" and most operations took 0-1 msec, but big files still took 5-10 msec. This was not on a SSD system, but my old PC with basic hard drives. For a moderately big scene the writes totaled to 3,5 seconds, which is a lot when the whole login/asset loading time is ~10-15 seconds.

For clarification: This code does NOT spin a thread per disk write. It utilizes Qt:s global thread pool that efficiently runs all available cores with reusable threads where the QRunnable is assigned to. It should not have a lot of overhead once it gets going when the threads are started and we keep feeding them these QRunnables.

cadaver commented 11 years ago

Nice work. Actually threading IAsset::LoadFromFile would require extensive modifications, and we need to retain the possibility for synchronous loading/saving also, for example for editing purposes.

But whenever we have an AssetTransfer, we can delay it arbitrarily and execute it in a background thread. For example LocalAssetProvider currently allows max. 16msec's main thread time per frame for the transfers, but this could be moved to continuously execute in background. Of course any modifications like this need to be profiled so that they don't actually induce slowdown.

Cache reads, when we're not using the threaded Ogre load mechanism, should be possible to change to use a threaded & delayed asset transfer into a vector instead of LoadFromFile.

cadaver commented 11 years ago

Tested briefly, it did improve loading time and responsiveness when loading Circus scene with a cold cache.

jonnenauha commented 11 years ago

Good stuff. We could just do the size check of (data.size() < 500 * 1024) and not spin the worker from the threadpool. Most small assets like scripts and materials probably will just go faster without it. What do you think?

Best regards, Jonne Nauha Meshmoon developer at Adminotech Ltd. www.meshmoon.com

On Fri, Aug 2, 2013 at 5:48 PM, cadaver notifications@github.com wrote:

Tested briefly, it did improve loading time and responsiveness when loading Circus scene with a cold cache.

— Reply to this email directly or view it on GitHubhttps://github.com/realXtend/naali/pull/707#issuecomment-22010499 .

cadaver commented 11 years ago

It's worth testing.

Btw. when doing testing with a http server on the same machine the requests complete so fast that the main loop gets completely bogged down by just responding to completed requests. It's slightly better when the write workers are utilized. It would be preferable to limit the response time to completed requests to some amount of milliseconds per frame (similarly how LocalAssetProvider limits the time it serves requests per frame), but as those are Qt signals it may not be that easy.

jonnenauha commented 11 years ago

We could make a queue here https://github.com/realXtend/naali/pull/707/files#L0R608 that would also be used in the event of the body data being very small (aka we write to disk directly in the main thread).

I'll do these two tweaks and merge to main.

jonnenauha commented 11 years ago

Seems that I just fucked this up real bad, even if I tought that I merged rex/tundra2 :) Closing this and sending new one after the discussed fixes.

erno commented 11 years ago

Small writes stall too, especially if there's other writes happening simultaneously. Even just opening the file for writing can block if the relevant filesystem metadata is not cached.

jonnenauha commented 11 years ago

Hmm, I made a the queue system now and made it only do this for > 500kb data sizes. I have no preference here, I guess making everything async will work too. I guess the constant small overhead is better than stalling once for 100msec? :)

realXtend / tundra

Meshmoon - Async HttpAssetProvider cache disk write #707