iut-ibk / DynaMind-ToolBox

DynaMind-ToolBox
www.dance4water.org
GNU General Public License v2.0
9 stars 6 forks source link

Performance Cache -> DB #80

Closed christianurich closed 11 years ago

christianurich commented 11 years ago

I have done some testing of the DB in the last days.

Rasterdata works awesome, we write with 150 MB/sec in the DB so transferring big chunks of data works really well.

When the db kicks in the vectordata performance together with the states is really really slow

With the simulation I used it starts to slow down here:

exportentwork.cpp line 215

    Logger(Debug) << "Done with the Agent Baed Model";
    foreach(std::string name, city->getUUIDsOfComponentsInView(Conduits)) {
        DM::Component * c = city->getComponent(name);
        DM::Attribute attr("New", 1);
        c->changeAttribute(attr);
    }

the problem is that following code takes hours for 5714 components!

Success ExtractNetwork {31e01691-20ea-4bb5-bbca-68d0fc507d37} Counter 0 time 59668.1

city->getUUIDsOfComponentsInView(Conduits)

So I was looking a little bit closer into the get component thingy. The problem is here (I added some debug code so it looks slightly different) in the getComponent(it->first) first method.

std::map<std::string, Component*> DerivedSystem::getAllComponentsInView(const DM::View & view)
{
    //return views[view.getName()];
    std::map<std::string, Component*> comps = views[view.getName()];
    Logger(Debug) << "Start with get components from derived system numberOfComponents" << comps.size();
    int counter = 0;
    for(std::map<std::string, Component*>::iterator it = comps.begin(); it != comps.end(); ++it) {
        it->second = getComponent(it->first);
        Logger(Debug) << "Done with " << counter++;
    }
    Logger(Debug) << "Done and return";
    return comps;
}

Here is the Logger Output

DEBUG Sun Apr 14 15:07:12 2013| Start with get components from derived system numberOfComponents 5714 DEBUG Sun Apr 14 15:07:21 2013| Done with 0 DEBUG Sun Apr 14 15:07:30 2013| Done with 1 DEBUG Sun Apr 14 15:07:39 2013| Done with 2 DEBUG Sun Apr 14 15:07:47 2013| Done with 3 DEBUG Sun Apr 14 15:07:56 2013| Done with 4 DEBUG Sun Apr 14 15:08:05 2013| Done with 5 DEBUG Sun Apr 14 15:08:14 2013| Done with 6 DEBUG Sun Apr 14 15:08:23 2013| Done with 7 DEBUG Sun Apr 14 15:08:32 2013| Done with 8 DEBUG Sun Apr 14 15:08:40 2013| Done with 9 DEBUG Sun Apr 14 15:08:49 2013| Done with 10 DEBUG Sun Apr 14 15:08:58 2013| Done with 11

it takes 9 seconds per component!!!!!!!

The cache is huge --nodecache 5000000 --attributecache 5000000 (uses up more than 4 gigs of my ram)

Please look into this I think this is causing all my troubles.

zacharias2k commented 11 years ago

Ok i guess those are components which need to be taken from a predecessor state. So in each getComponent all attributes get copied - that shouldn't be the case anymore, since https://github.com/iut-ibk/DynaMind/commit/90d6fd1f77503daa7297064d245cb4c96e2c8783 I will take a closer look.

zacharias2k commented 11 years ago

this improves the performance for read only views: https://github.com/iut-ibk/DynaMind/commit/c1376ea276045a0395c935d2f728a5ca903bea09

christianurich commented 11 years ago

I removed some stuff from the module but it is still pretty slow. At least we get now 5 elements per second. It takes around 40 minutes to access 11.000 elements and read one vale and write one.

zacharias2k commented 11 years ago

Model test_db_bigger.dyn? If not, please provide a test case.

christianurich commented 11 years ago

Was for the big one. Tried our default simulation (name has change, see unstable branch for the file) Data/Simulations/sandbox_drainage_system_with_infiltration.dyn --nodecache 500000 --attributecache 500000 --loglevel 0

Same result (compiled as release) as soon as the db kicks in

INFO Wed Apr 24 18:32:55 2013| Start AttributeCalculator Impervious {8a536167-f8d0-40c8-b647-07393c341646} Counter 0 DEBUG Wed Apr 24 18:32:55 2013| 1636 / 1 DEBUG Wed Apr 24 18:32:55 2013| 1636 / 2 DEBUG Wed Apr 24 18:32:55 2013| 1636 / 3 DEBUG Wed Apr 24 18:32:55 2013| 1636 / 4 DEBUG Wed Apr 24 18:32:56 2013| 1636 / 5 DEBUG Wed Apr 24 18:32:56 2013| 1636 / 6 DEBUG Wed Apr 24 18:32:56 2013| 1636 / 7 DEBUG Wed Apr 24 18:32:56 2013| 1636 / 8 DEBUG Wed Apr 24 18:32:56 2013| 1636 / 9

zacharias2k commented 11 years ago

I've improved the code of attribute calculator - readability and function timing. Please optimize modules, if they are running slow. Reducing function count can improve the overall eprformance greatly (there was a part in AttributeCalculator which was able optimize from 1+N*log(N) calls to 1 call!).

Well the basic issue is not the "slow" database or caching, it's the concept of links, which lead to elements outside the defined view. You will have to access the elements without links, or via a read only view - if you want to have higher performance. I've discussed this with michael too, it's a design issue (of links), which will be addressed once simenv part 2 starts.

For now, i close this issue. It's not about Cache performance nor about the DB.

christianurich commented 11 years ago

That is not true, have really tested the issue? I get the same behaviour also in other modules as soon as the db kicks in where I just use defined views and no links.

The read only view doesn't help. As far as I understand if I use getComponent a successor state is created.

And be careful at the moment the access type of the view relates to the access type of the geometry (as the docu says). Which means that for the edge only the start and endnode are not changed. It's still possible to change attributes.

zacharias2k commented 11 years ago

If no links are used, the yesterdays core got an improvement, as commented before https://github.com/iut-ibk/DynaMind/commit/c1376ea276045a0395c935d2f728a5ca903bea09. For read only views.

If it's not read only, it will copy the node. If it's not fitting in the cache, it's copied from/to hard drive. getComponent can't decide wheter it's meant read only or not, there is no information on that.

If read, only refers to the geometry, i will undo the chances i've made - because a component can't change it's owning system. And that shouldn't ever be possible. Therefore a read only access is not possible under these circumstances.

zacharias2k commented 11 years ago

Memo: Yes it's true. Yes i'm testing, I'm testing for more then 2 weeks on not existing dead locks, improving and tweaking non-core module code and trying to develop things, which should be addressed at simenv part 2. Instead of working on simenv part 1, which should be my current task since 3 weeks.

christianurich commented 11 years ago

Updates on the actual increase or the actual performance would help in the discussion.

The problem is that when I write in the DB and the cache is full the performance is 5 components per second (as posted 2 times now). Maybe my system set up is messed up. But I don't have any other number to compare it with. Please update me on the performance. Is it just a handful of elements per second when the cache is full and the 'swapping' starts?

The access of the views can be checked with the class DataValidation (dmdatavalidation.h)

zacharias2k commented 11 years ago

The performance increases on code improvement cannot be generalized, most of them are trivial, minior or very case specific. The read only option (when including read only on attributes) could improve performance, case deptendend, up to a factor of 100, as analysis shows.

In the specific case of your provided simulation, the recursive gathering of (linked) components leads to a massive copy-wall as not all components needed fit into the cache. In the copy process most performance is taken away by copying attributes (rw on db) - thats why it takes 0.2 seconds per component. If you want a number to compare, just increase the cache to a level swapping does not take place. 500k seems AFAIK low anyway.

Changing getAccesssType to

view.getWriteAttributes().size() == 0 && view.getAccessType() == READ

should do it?

But it won't help in your model case, as components from links are not part of the getComponentsOfView, and a single component, returned via getComponent, does not know whether it's read only or not - so that wont help either. A possible improvement would be a getComponentReadOnly - but that will change API, increase complexity, break the idea of a self-handling core (as you pointed out in the beginning) and may conflict with later ideas in simenv part 2 (enhanced views).

christianurich commented 11 years ago

I know there is a lot of improvement in the code to do and also links are maybe a problem. For me the really important number are the 0.2 components per seconds.

So for big simulations increasing the cache is not an option anymore. So I'll have to use the DB. States still fit in the ram. (for the test file 500k should be more than enough to fit one state and otherwise we are not testing the db)

So writing 10.000 elements just takes a lot of time when the cache is full, for my simulations 40 minutes

Don't get me wrong but all the other improvements don't really matter in this case. We just shift the problem.

I would like to know if

A) this is really the case (slow writing) or is just my computer, is it much faster on your computer, if so why? B) when slow what can we do something about it?

zacharias2k commented 11 years ago

I may have a fix for it, not really beautiful, but it will do it with some internal changes. I'm publishing it when it's stable.

Big simulations will run on DB - thats the whole point of my core. But one can avoid it through well done modules - on cost of "less granulated" modules.

All other improvements will help us in future, in particular other simulations than yours. It's not only about improving the performance of your specific simulation. There is no shift.

a) Our machines are fairly different, while your hard disc may be fast due to SSD tech, my single core performance is always ahead. But i can't really tell, i'm switching constantly between debug, release, release with minimal debug info and profiling (debug with no external debug thread). And i'm working on a linux machine via remote access, for further testings - as most of the profilings take a long time - with 8 threads.

b) reduce functions calls, which lead to unnecessary cache access, enlarge cache so all module data fit into the cache (btw thats a crucial point, if thing do not fit in ram, one has to process chunks/blocks - later for simenv part 2,3,4....)

christianurich commented 11 years ago

I know all that.

I just want to know

How many components can I write per sec when the cache is full and I work with states? Is it a 1.000 per sec or just a few?

zacharias2k commented 11 years ago

As written before, the component itself doesn't cost that much, again, the number of attributes is significant. And attributes are not of the same type, a time series will take much longer than a double. The difference is huge, it also depends on how big the current db file is, how big the cache is, in first place, etc etc So it's impossible to provide a number per component.

christianurich commented 11 years ago

Again I go over 10.000 components and do nothing else than to write one attribute per component how long does it take. Seconds, minutes or hours.
If it depends on the number of attributes make an example for me.

zacharias2k commented 11 years ago

double attributes on one component:

1k 3ms 10k 1488 ms 100k 17260ms

christianurich commented 11 years ago

Is this with successor state?

zacharias2k commented 11 years ago

no, bare writing into db, sketch:

// set cache cfg: infinite to all, except attributecache to 3
DM::System sys;
DM::Component c;
sys.AddComponent(&c);
for(int i 0 to 10003)
    c.addAttribute(string(i), i);

in the newst core, the cache unit test has an outcommented part at the end - thats basically the test.

christianurich commented 11 years ago

That are good results. So my loop with states in the simulation is 2.000 times slower (not sure if I have done it right in my head ) I'll put together a standard test set for performance testing tomorrow so we can focus on this and maybe we can identify what the problem is.

zacharias2k commented 11 years ago

The problem is the only read pointer, i uploaded a new core where the successors are optimized - as said it's a bit tricky. I'm testing it right now. Why don't you use a bigger cache for now, we can deal with that later on.

christianurich commented 11 years ago

I'm out of ram. But anyway I'll put together a test set that just focuses on the db and states so we can compare results, test the new solutions and talk about the same issues.

zacharias2k commented 11 years ago

Yeah, but keep in mind, that i'm off the whole next week.

christianurich commented 11 years ago

I created a simple performance test. At the moment without states. If the results are right (please check the code) the problem is the read access detailed results are here (https://github.com/iut-ibk/DynaMind-ToolBox/wiki/Performance-Test-DB)

Write access is constant and not to bad. In average 0.25ms per element (similar to your results with a little bit a different test)

The read access is getting worse with the size of the db and is in all cases much slower than the read access

for 10.000 6 ms per element for 100.000 119 ms per element for 1.000.000 1138 ms per element

If you could confirm that I didn't make a mistake in the code I would make a simular test for the states.

christianurich commented 11 years ago

Other minor things

zacharias2k commented 11 years ago

I reviewed the code, it's not a bad test, but the setting causes a very bad db behaviour. Please use well defined numbers, and please read the cache settings docu - Esspecially regarding

cfgNew.queryStackSize = 1234;
cfgNew.cacheBlockwritingSize = 1234;

The first option will greatly increase RAM usage, the second burn down the db performance and works just because of a safty net.

I will improve the code.

christianurich commented 11 years ago

I just copied the settings from your unit test ;-) and I don't fully understand what the settings are (and don't really need to know).

Yeah please choose better setting and maybe it helps us with the performance. Could you stick to the old API with the getUUIDs for this test. We could also make one with the new API to compare the speed up and also that I have an example of how to use it.

zacharias2k commented 11 years ago

Please read docu https://github.com/iut-ibk/DynaMind-ToolBox/wiki/Cache-settings-and-flags But anyway, it won't help us, because read and write into DB can't be influenced by us (by now). The question is how to minimize db access (and cache usage), thats why i'm talking about optimizing modules and functions, reducing function calls.

Once again: DB access can't be faster then now, the driver and hardware gives us the performance. What you are doing is a benchmark on your personal hardware system. Thats why there is a profiling test on cache in the unit test, but no db-access test (because it's hardware dependend and can't be influenced by code (just a few ppm, by now)). That's why i closed the issue, because the Cache -> DB i/o performance (per element) can't be changed (i think i also mentioned that above).

I'm going to write a larger article about the whole issue, to make clear how core, cache and db works. Meanwhile, skip the test, and read the current docu on cache.

PS: i may got some improvements on successor states, stay tuned. PPS: if there are questions, no big deal. But we should discuss them per email, this threads about Cache->DB performance :-)

zacharias2k commented 11 years ago

Btw Components are huge, as i can't cache everything the structure itself takes a lot of space. In particular node-edge maps. Thats why a simulation takes a lot of RAM, even with small cache sizes.

christianurich commented 11 years ago

I was just surprised how much it really is. But that's probably an issue to address in version 0.7 or 0.8. If it makes a problem.

zacharias2k commented 11 years ago

check it out with sizeof( x ) - returns the size in bytes:

sizeof(DM::Component) 68
sizeof(DM::Node) 76 (+ 24 in cache)
sizeof(DM::Edge) 80
sizeof(DM::Face) 104 + 4*nodecount
sizeof(DM::RasterData) 152 (+ a lot in cache)
sizeof(DM::System) 264
sizeof(DM::Attribute) 72 + (min 4 in cache, up to thousands)