Open parkerzf opened 2 years ago
Thanks @beckernick That makes total sense. I will read more about the blogs you shared. I have a follow up question, actually what I want is to load the data to cugraph
and run the pagerank algorithm, not compute
. However, it still shows the OOM error. You may find the detailed error message in this issue: https://github.com/rapidsai/cugraph/issues/2694.
import dask
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
import dask_cudf
import cugraph
import cugraph.dask as dask_cugraph
from cugraph.dask.common.mg_utils import get_visible_devices
from cugraph.dask.comms import comms as Comms
import time
csv_file_name = "twitter-2010.csv"
with dask.config.set(jit_unspill=True):
with LocalCUDACluster(n_workers=8, device_memory_limit="16GB") as cluster:
with Client(cluster) as client:
client.wait_for_workers(len(get_visible_devices()))
Comms.initialize(p2p=True)
chunksize = dask_cugraph.get_chunksize(csv_file_name)
ddf = dask_cudf.read_csv(csv_file_name, chunksize=chunksize, delimiter=' ', names=['src', 'dst'], dtype=['int32', 'int32'])
G = cugraph.Graph(directed=True)
G.from_dask_cudf_edgelist(ddf, source='src', destination='dst')
This doesn't seem to be the same issue because I don't collect all the data to a single GPU. Do you maybe have a hint what could be the reason for that? Thanks!
@parkerzf , did you find any workaround for the above issue?
Nope, hope that someone else could share their experience to deal with large dataset.
Hey I try to load the twitter graph in a AWS
p3.16xlarge
instance, which has 8 16GB memory GPUs, in total 128GB. However, it is OOM. Could you please take a look if I missed anything? Thanks so much!I can't find similar issues, this one got similar errors but it is because LocalCUDACluster is not used.
I used the docker approach to install the rapid frameworks:
The error log: