turi-code / SFrame

SFrame: Scalable tabular and graph data-structures built for out-of-core data analysis and machine learning.
BSD 3-Clause "New" or "Revised" License
890 stars 326 forks source link

The groupby operation uses up whole system disk space #378

Open jonbakerfish opened 8 years ago

jonbakerfish commented 8 years ago

I ran a groupby on a large SFrame table (i.e. foo.shape=(1183747, 3110)). After a while, my system disk / is full and pops up the following errors:

Traceback (most recent call last):
  File "test.py", line 134, in <module>
    'bs':gl.aggregate.CONCAT('b'),
  File "/home/.../anaconda2/lib/python2.7/site-packages/graphlab/data_structures/sframe.py", line 4651, in groupby
    group_ops))
  File "/home/.../anaconda2/lib/python2.7/site-packages/graphlab/cython/context.py", line 49, in __exit__
    raise exc_type(exc_value)
IOError: Fail to write. Disk may be full.: unspecified iostream_category error: unspecified iostream_category error

After the program die, the disk space usage backs to normal.

Can I change the SFrame's cache location to somewhere else? (I have larger disks besides the system disk.)

korbonits commented 7 years ago

+1, I am facing this right now.