yandexdataschool / tinyverse

Universe RL trainer platform. Simple. Supple. Scalable.
10 stars 5 forks source link

Database #2

Closed justheuristic closed 7 years ago

justheuristic commented 7 years ago

We need some sort of storage for

For an ugly version 1, i used a minimalistic mongoDB wrapper with an app for numpy arrays. After we assemble the crude prototype, we may want to use some other DB that's more suitable. Pls tell us if you know what DB would fit here.

justheuristic commented 7 years ago

Tested the parallel writing:

Machine specs:

1 process doing this

3 processes doing this

12 processes doing this

20 processes doing this

justheuristic commented 7 years ago

Switched to redis, it now works at ~30 writes/s per process(was 8), 90 reads/s per process (was 12). Also the code for database is now a lot shorter.

justheuristic commented 7 years ago

Replaced pickle with joblib, https://github.com/yandexdataschool/tinyverse/commit/c92a8d0bce48a7c33aa74633028138fc9059d290

Timing:

import numpy as np
arr = [np.random.randn(25,1,64,64),np.random.randint(0,10,size=(25)) ]*3

print "\npickle dump/load"
from six.moves import cPickle as pickle
%timeit pickle.dumps(arr)
s = pickle.dumps(arr)
%timeit pickle.loads(s);

print "\nbytes dump/load (loses array shape/type)"
%timeit np.concatenate(map(np.ravel,arr)).tobytes()
s = np.concatenate(map(np.ravel,arr)).tobytes()
%timeit np.fromstring(s)

print "\njoblib dump/load"
from io import BytesIO
import joblib
def jobdump(arr):
    s = StringIO()
    joblib.dump(arr,s)
    return s.getvalue()
def jobload(s):
    return joblib.load(StringIO(s))

%timeit jobdump(arr)
s = jobdump(arr)
%timeit jobload(s)

Output (Core 2, old laptop)


pickle dump/load
10 loops, best of 3: 113 ms per loop
10 loops, best of 3: 23.1 ms per loop

bytes dump/load (loses array shape/type)
100 loops, best of 3: 2.11 ms per loop
1000 loops, best of 3: 862 µs per loop

joblib dump/load
100 loops, best of 3: 4.45 ms per loop
100 loops, best of 3: 3.46 ms per loop
justheuristic commented 7 years ago

Seems to be quite reliable, more extensions will be added in separate issues.