satori-com / mzbench

MZ Benchmarking
BSD 3-Clause "New" or "Revised" License
271 stars 78 forks source link

I can't use big resorce file for many workers. Memory overflow. #130

Open loguntsov opened 6 years ago

loguntsov commented 6 years ago

I found, i can't use big resource file for workers. Because this file loaded for each worker. it is not good for me. As example. I keeps all users info into one tsv file (login, password and etc). These users can be more 100k and this file can have big size. About 10Mb. So it is not good when each user from 100k should load this file into his memory (memory of process). I should use my own code inside MZBench to spread this info by workers (it is like local gen_server for node). but i don't know way how i can have global gen_server, because you have no cluster for workers.

I just want to get resource files as it was implemented in TSUNG.

What is your suggestion for this case ?

parsifal-47 commented 6 years ago

Is it correct that you need to spread some information across your workers? The simplest case of course if you could generate this information in place somehow, there is no built-in clustering so you couldn't use standalone gen_server for that, unfortunately, that's right.

loguntsov commented 6 years ago

Is it correct that you need to spread some information across your workers?

Yes it is correct. I have list of users with their passwords. I don't want modify original system for tests. It means each worker should get this information from one place. TSUNG has this logic, and i think MZBench could have this also.

How you see this logic in your code ? Could you explain ?

parsifal-47 commented 6 years ago

In our workflows we usually generate data in-place, so each worker is generating small piece of data which it needs.

In your case the data is external, you may use hooks to load this information from external source, for example from file or URL: https://github.com/satori-com/mzbench/blob/master/doc/workers.md#hooks

This code will be executed once per node, instead of putting it into env (like in example above) you may put it to ets to avoid data duplication across worker threads

loguntsov commented 6 years ago

This code will be executed once per node, instead of putting it into env (like in example above) you may put it to ets to avoid data duplication across worker threads

This is problem. No way to get unique values for different nodes. I know about hooks and i made StatsD receiver for this scheme. I can also create gen_server to get unique values by node, but i can't syncronize it between different nodes. So i just asked you about this way.

I think MZBench should have common way (protocol) for interaction between director and other nodes. I mean some module should hide this interaction and must provide some Erlang API for this.

loguntsov commented 6 years ago

Anyway i think the resource file should be as global thing. It is not good when you have resource instance for each worker, because these data are equal. So don't need keep same data for each worker.

parsifal-47 commented 6 years ago

I understand that it is not the solution you look for but you could generate unique numbers for each node based on its ID.

Synchronization is also possible but probably not in a way that you want, we have signals.

We have communication protocol, but we are not ready to give it to users, because it usually ends with non-scalable benchmarks.

Anyway, you are right, resource files could be optimized, there is no straight way to use big ones with big number of workers.

loguntsov commented 6 years ago

but you could generate unique numbers for each node based on its ID.

so how i can get current number for node and count of nodes on node side ? Then i can make gen_server which will get each 3th from 4 (as example) rows of resource file.

parsifal-47 commented 6 years ago

We don't use erlang clustering, but I believe that we assign unique node names, so node() should work for node name.

As for node list, try to use mzb_interconnect:nodes(), it is not official API of course, please let me know about the results

timofey-barmin commented 6 years ago

Hi Sergey, Currently you can get the following information from inside any worker function:

WorkerID = proplists:get_value(worker_id, Meta),
WorkersNum = proplists:get_value(pool_size, Meta),
PoolID = proplists:get_value(pool_id, Meta),
PoolNum = proplists:get_value(pools_num, Meta),

Pair {WorkerID, PoolID} is unique across all workers inside a benchmark. If you need unique id for worker across all benchmarks we can add benchmark id to meta as well (if it is not already there, I need to check).

I'm not sure I understand why you need to know anything about nodes at all. Can you elaborate?