Open henrymartin1 opened 2 years ago
Uff :D To be honest I am not quite sure if we can fix this one. Like most of our function depend on that we have the whole dataset in memory for groupby, sorting and such. Unless we consume the iterator into a big dataframe this problem consists but then the chunksize parameter is not that useful.
What would be your usecase?
I would rather add a more useful error message.
Hm... I see your point. So I am currently working with a large dataset that barely fits into my memory. There seems to be some overhead related to reading/writing to/from postgis which is enough to increase the memory consumption so that reading/writing operations fail in this case. This overhead seems to be lower if chunksize!=None
meaning that I can send/read the data without that it fails.
I am not sure how we would change it to be honest so it might be best to simply add a better error message or a check that says that the chunksize argument is not supported at the moment. At least until we have a proper big data strategy for trackintel ;-)
The error can be reproduced by setting the chunksize argument in any of the
test_read
tests intest_postgis
e.g., hereThe problem seems to be that
gpd.GeoDataFrame.from_postgis
returns a generator instead of a geodataframe. In the documentation ofgpd.GeoDataFrame.from_postgis
it says that one should usegpd.read_postgis
maybe this already fixes the problem.