quiver-team / torch-quiver

PyTorch Library for Low-Latency, High-Throughput Graph Learning on GPUs.
https://torch-quiver.readthedocs.io/en/latest/
Apache License 2.0
293 stars 36 forks source link

Error when cache size is larger than the size of feature #97

Closed Joeyzhouqihui closed 2 years ago

Joeyzhouqihui commented 2 years ago

When I try to put all features in gpu memory, an error occur in the function "__getitem__" in file "feature.py".

I guess the root cause is in the function "append" in file "quiver_feature.cu" at line 193, which is "quiverRegister(tensor.data_ptr(), data_size, cudaHostRegisterMapped);".

The error will occur when trying to register zero-copy memory of 0 byte. The problem can be fixed by adding an if statement before registering memory to prevent memory registering when the feature tensor is empty.

eedalong commented 2 years ago

Hi, can you give us a minimal code example? quiver.Feature should have no problem with caching all data on GPU

eedalong commented 2 years ago

@Joeyzhouqihui

Joeyzhouqihui commented 2 years ago

Just run the example code "dist_sampling_ogb_reddit_quiver.py" and set the cache_size of feature to be larger than the size of reddit(537M).

eedalong commented 2 years ago

@Joeyzhouqihui yes, I reproduced this problem, seems there are some problems with quiver.Feature's IPC

eedalong commented 2 years ago

Hi, this is fixed in #98, please pull the latest code and try again~

eedalong commented 2 years ago

@Joeyzhouqihui