Database - Githubissues

justheuristic commented 7 years ago

We need some sort of storage for

game sessions
actual network and target network params
whatever metadata we might want to store as well

For an ugly version 1, i used a minimalistic mongoDB wrapper with an app for numpy arrays. After we assemble the crude prototype, we may want to use some other DB that's more suitable. Pls tell us if you know what DB would fit here.

justheuristic commented 7 years ago

Tested the parallel writing:

Machine specs:

16vcpu (virtual xeon)
50gb ram
ubuntu 14 [neurohogium]
minor background CPU load
mongodb from apt-get with default settings

1 process doing this

~10% cpu by mongodb
~8 uploads /s

3 processes doing this

~15% cpu by mongodb
~8 uploads /s each, ~24 total

12 processes doing this

~60% cpu by mongodb
~7 uploads /s, 80/s total
drops to ~20/s total ~ once a minute (mb due to flush)

20 processes doing this

~70% cpu by mongodb
~5.5 uploads /s each, ~110 total
drops to ~20/s total ~ once a minute (mb due to flush)
total cpu load from all processes peaked (1600%) periodically

justheuristic commented 7 years ago

Switched to redis, it now works at ~30 writes/s per process(was 8), 90 reads/s per process (was 12). Also the code for database is now a lot shorter.

justheuristic commented 7 years ago

Replaced pickle with joblib, https://github.com/yandexdataschool/tinyverse/commit/c92a8d0bce48a7c33aa74633028138fc9059d290

Timing:

import numpy as np
arr = [np.random.randn(25,1,64,64),np.random.randint(0,10,size=(25)) ]*3

print "\npickle dump/load"
from six.moves import cPickle as pickle
%timeit pickle.dumps(arr)
s = pickle.dumps(arr)
%timeit pickle.loads(s);

print "\nbytes dump/load (loses array shape/type)"
%timeit np.concatenate(map(np.ravel,arr)).tobytes()
s = np.concatenate(map(np.ravel,arr)).tobytes()
%timeit np.fromstring(s)

print "\njoblib dump/load"
from io import BytesIO
import joblib
def jobdump(arr):
    s = StringIO()
    joblib.dump(arr,s)
    return s.getvalue()
def jobload(s):
    return joblib.load(StringIO(s))

%timeit jobdump(arr)
s = jobdump(arr)
%timeit jobload(s)

Output (Core 2, old laptop)


pickle dump/load
10 loops, best of 3: 113 ms per loop
10 loops, best of 3: 23.1 ms per loop

bytes dump/load (loses array shape/type)
100 loops, best of 3: 2.11 ms per loop
1000 loops, best of 3: 862 µs per loop

joblib dump/load
100 loops, best of 3: 4.45 ms per loop
100 loops, best of 3: 3.46 ms per loop

justheuristic commented 7 years ago

Seems to be quite reliable, more extensions will be added in separate issues.

yandexdataschool / tinyverse

Database #2

1 process doing this

3 processes doing this

12 processes doing this

20 processes doing this