SFrame is stated to deal with large amounts of data without using mush RAM , but it leaks memory on simple tasks.
This sample of code continues to increase the RAM usage forever , and is strangely slow.
The speed can be explained with the disk storage that the library uses to deal with large sets.
import sframe as sf
data = sf.SFrame({'a':['string']*1000,'b':[1]*1000,'c':[{'key1':1}]*1000})
for i in xrange(10000):
a = data.to_numpy()
Another example is:
suma=0
for i in xrange(10000):
for row in datain:
suma+=row['b']
The RAM usage steadily increases.
This are just samples , not real usage.
The thing that i am trying to accomplish with the library is to read the data from the SFrame one by one or batch by batch and agregate it without loading it in RAM .Actually to construct a sparse matrix which i will use for training with Gradient Descent.
If i iterate it batch by batch , after the iteration SFrame uses a lot of ram and doesn't release it. It uses no less memory that the real size of the data so using it becomes pointless.
SFrame is stated to deal with large amounts of data without using mush RAM , but it leaks memory on simple tasks.
This sample of code continues to increase the RAM usage forever , and is strangely slow. The speed can be explained with the disk storage that the library uses to deal with large sets.
Another example is:
The RAM usage steadily increases.
This are just samples , not real usage.
The thing that i am trying to accomplish with the library is to read the data from the SFrame one by one or batch by batch and agregate it without loading it in RAM .Actually to construct a sparse matrix which i will use for training with Gradient Descent.
If i iterate it batch by batch , after the iteration SFrame uses a lot of ram and doesn't release it. It uses no less memory that the real size of the data so using it becomes pointless.