Open JugglingNumbers opened 4 years ago
Hmm...changing this just result in a similar exception at line 182. I've changed it into:
new_npartitions = npartitions
if new_npartitions is None:
memusage = data.memory_usage(deep=True).sum()
new_npartitions = int(1 + memusage // DEFAULT_PARTITION_SIZE)
# combine old dataframe with new
current = self.item(item)
new = dd.from_pandas(data, npartitions=new_npartitions)
That seems to work at a first glance.
@ancher1912 yup you've encountered the other append error: https://github.com/ranaroussi/pystore/issues/31
The other option is just to change the last line of your blurb to
new= dd.from_pandas(data, npartitions=1
since the combined dask dataframe is partitioned by the variable npartitions it doesn't matter if we only use one partition when converting the new dataframe to dask.
Yeah, you're right. I've send a pull request to @ranaroussi with you're proposed changed. At least I can continue doing what I was doing before I did an update of Dask and FastParquet.
When using item.append(item, new_data, npartitions=35) the write function is passed npartitions = None. Should be npartitions=npartitions https://github.com/ranaroussi/pystore/blob/40de1d51236fd6b6b88909c83dc6d7297de4b471/pystore/collection.py#L194-L196