[FEA] Define a standard mechanism for querying GPU memory usage

vyasr commented 3 years ago

Is your feature request related to a problem? Please describe. Python's sys module provides the sys.getsizeof function to determine the size of a Python object. The behavior of getsizeof when applied to a user-defined class may be customized by overriding the __sizeof__ attribute. For the purpose of computing the size of a Python object backed by GPU memory, however, getsizeof has a couple of major drawbacks:

getsizeof is traditionally defined as a shallow calculation, so the sizeof a container will not recursively traverse nested elements. The sizeof a list of lists is essentially equivalent to (in pseudocode) sizeof(PyObject *) * len(list), with few extra bytes allocated for the overhead of the list's metadata. The internet is rife with recipes for performing a corresponding deep calculation, but they typically have sharp edges and in practice users of GPU libraries usually want the deep calculation if they are making this request. Even if a suitable override of this attribute could be defined that always returned the deep calculation, it would not be desirable to do so since it would overload the standard meaning of the operator in Python.
GPU memory allocations are more complex than host calculations in the sense that there are multiple "pools" of memory from which a buffer might be allocated and the user may be interested in having those separated out. In addition to standard device allocations via cudaMalloc, a user may also have requested pinned host memory or managed memory. Any API for querying GPU memory usage must be sufficiently general to support all of these types of information.

Various higher-level Python libraries that leverage GPU libraries under the hood would benefit from a standardized approach to requesting total GPU memory allocations. For instance, Dask could leverage this calculation to determine when to spill memory to disk.

Describe the solution you'd like It would be nice to define a standard protocol for all Python libraries backed by GPU memory to expose the allocations underlying a Python object. The most obvious possibility for this would be a new protocol to correspond with __cuda_array_interface__, something like __cuda_sizeof__ that would return a dictionary of allocated memory by type, but other implementations are also possible. It would be important to consider whether this would always be a deep calculation or if there would be any cases when a shallow calculation might be appropriate, for instance with containers (like those that might live in cuCollections). It would also be important to consider how it should behave for slices: for instance, would s = cudf.Series(1000); s[::2].__cuda_sizeof__ indicate the size of the a column of size 1000 or 500?

It may also be necessary for users to have some way to account for host memory allocations, but I think it makes the most sense to have that calculation be entirely independent of this protocol. That does raise some questions about the suitable way to treat pinned memory (host memory allocated via cudaHostAlloc).

Describe alternatives you've considered None at this stage.

Additional context This proposal comes out of a discussion precipitated in https://github.com/rapidsai/cudf/pull/9544#issuecomment-955011270. That PR removed the __sizeof__ overrides in cuDF, which were likely to be more confusing than helpful, and standardized the memory_usage method of cuDF objects. memory_usage is a pandas function that we seek to mimic, but our goal of making cuDF objects pandas-compatible makes this method unsuitable for adaptation into a new "gpusizeof" protocol.

vyasr commented 3 years ago

CC @leofang @jakirkham @gmarkall

leofang commented 3 years ago

cc: @kmaehashi @emcastillo @asi1024 for vis

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

github-actions[bot] commented 2 years ago

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

rapidsai / cudf

[FEA] Define a standard mechanism for querying GPU memory usage #9587