Closed justheuristic closed 7 years ago
Tested the parallel writing:
Machine specs:
Switched to redis, it now works at ~30 writes/s per process(was 8), 90 reads/s per process (was 12). Also the code for database is now a lot shorter.
Replaced pickle with joblib, https://github.com/yandexdataschool/tinyverse/commit/c92a8d0bce48a7c33aa74633028138fc9059d290
Timing:
import numpy as np
arr = [np.random.randn(25,1,64,64),np.random.randint(0,10,size=(25)) ]*3
print "\npickle dump/load"
from six.moves import cPickle as pickle
%timeit pickle.dumps(arr)
s = pickle.dumps(arr)
%timeit pickle.loads(s);
print "\nbytes dump/load (loses array shape/type)"
%timeit np.concatenate(map(np.ravel,arr)).tobytes()
s = np.concatenate(map(np.ravel,arr)).tobytes()
%timeit np.fromstring(s)
print "\njoblib dump/load"
from io import BytesIO
import joblib
def jobdump(arr):
s = StringIO()
joblib.dump(arr,s)
return s.getvalue()
def jobload(s):
return joblib.load(StringIO(s))
%timeit jobdump(arr)
s = jobdump(arr)
%timeit jobload(s)
Output (Core 2, old laptop)
pickle dump/load
10 loops, best of 3: 113 ms per loop
10 loops, best of 3: 23.1 ms per loop
bytes dump/load (loses array shape/type)
100 loops, best of 3: 2.11 ms per loop
1000 loops, best of 3: 862 µs per loop
joblib dump/load
100 loops, best of 3: 4.45 ms per loop
100 loops, best of 3: 3.46 ms per loop
Seems to be quite reliable, more extensions will be added in separate issues.
We need some sort of storage for
For an ugly version 1, i used a minimalistic mongoDB wrapper with an app for numpy arrays. After we assemble the crude prototype, we may want to use some other DB that's more suitable. Pls tell us if you know what DB would fit here.