Closed darrepac closed 2 years ago
This is very likely to be due to insufficient RAM. If a single column of your DataFrame
is stored as float64
then it needs 185341302*8 / 1024**3 GB = 1.4 GB, so two columns need 2.8 GB. Hence reading your CSV file into pandas
consumes all of your RAM before you have used any datashader
code.
You could try using a dtype
of np.float32
so that your 2 columns only need 1.4 GB, but I suspect you will still run out of RAM when you try to do something useful with the data. You could switch to using a dask.DataFrame
instead of a pandas.DataFrame
, or of course use a machine with more RAM.
So it means that it is taking into RAM the whole dataset...so indeed, it will be complex, knowing that it was a "small" dataset to test ;) Dask, why not, but I have first to learn what it is and how to do it ;)
It is not a massive data set, but 2 GB RAM is very small. Most mobile phones have more RAM than this, and I wouldn't attempt any serious calculations on a mobile phone.
Right, I'd try using a bigger machine. You can totally make it work on a smaller machine, e.g. by converting your dataset to Parquet and using Dask with persist=False to work "out of core", paging in chunks are you work with them. But it will be tricky to get that running while under the constraint of not being able to load in the full data even for converting it, and out of core work is vastly slower, so because I value my own time I'd switch to a suitable machine to work with.
Hi
I run the demo code in datashader homepage without issue:
now I move to another database and I got:
# python3 map.py **Killed**
Here is the number of lat/lon point in my files:# cat data/stored.csv | wc -l 185341302
System: Ubuntu 22.04 running on docker (Synology DS220+ with 2GB of RAM) The error message is not really helping... problem of RAM? any hint to overcome this?