Open vyasr opened 3 years ago
CC @leofang @jakirkham @gmarkall
cc: @kmaehashi @emcastillo @asi1024 for vis
This issue has been labeled inactive-30d
due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d
if there is no activity in the next 60 days.
This issue has been labeled inactive-90d
due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Is your feature request related to a problem? Please describe. Python's
sys
module provides thesys.getsizeof
function to determine the size of a Python object. The behavior ofgetsizeof
when applied to a user-defined class may be customized by overriding the__sizeof__
attribute. For the purpose of computing the size of a Python object backed by GPU memory, however,getsizeof
has a couple of major drawbacks:getsizeof
is traditionally defined as a shallow calculation, so thesizeof
a container will not recursively traverse nested elements. Thesizeof
a list of lists is essentially equivalent to (in pseudocode)sizeof(PyObject *) * len(list)
, with few extra bytes allocated for the overhead of the list's metadata. The internet is rife with recipes for performing a corresponding deep calculation, but they typically have sharp edges and in practice users of GPU libraries usually want the deep calculation if they are making this request. Even if a suitable override of this attribute could be defined that always returned the deep calculation, it would not be desirable to do so since it would overload the standard meaning of the operator in Python.Various higher-level Python libraries that leverage GPU libraries under the hood would benefit from a standardized approach to requesting total GPU memory allocations. For instance, Dask could leverage this calculation to determine when to spill memory to disk.
Describe the solution you'd like It would be nice to define a standard protocol for all Python libraries backed by GPU memory to expose the allocations underlying a Python object. The most obvious possibility for this would be a new protocol to correspond with
__cuda_array_interface__
, something like__cuda_sizeof__
that would return a dictionary of allocated memory by type, but other implementations are also possible. It would be important to consider whether this would always be a deep calculation or if there would be any cases when a shallow calculation might be appropriate, for instance with containers (like those that might live in cuCollections). It would also be important to consider how it should behave for slices: for instance, woulds = cudf.Series(1000); s[::2].__cuda_sizeof__
indicate the size of the a column of size 1000 or 500?It may also be necessary for users to have some way to account for host memory allocations, but I think it makes the most sense to have that calculation be entirely independent of this protocol. That does raise some questions about the suitable way to treat pinned memory (host memory allocated via
cudaHostAlloc
).Describe alternatives you've considered None at this stage.
Additional context This proposal comes out of a discussion precipitated in https://github.com/rapidsai/cudf/pull/9544#issuecomment-955011270. That PR removed the
__sizeof__
overrides in cuDF, which were likely to be more confusing than helpful, and standardized thememory_usage
method of cuDF objects.memory_usage
is a pandas function that we seek to mimic, but our goal of making cuDF objects pandas-compatible makes this method unsuitable for adaptation into a new "gpusizeof" protocol.