spcl / dace

DaCe - Data Centric Parallel Programming
http://dace.is/fast
BSD 3-Clause "New" or "Revised" License
490 stars 122 forks source link

Conserve python object in callbacks #1188

Open FlorianDeconinck opened 1 year ago

FlorianDeconinck commented 1 year ago

Issue Consider this code

class MyObj_Is_An_Array:
   def __init__():
      self.a_data_to_access = 42
   def __descriptor__(): # Give DaCe the possibility to use it as an array
            ...

# For the sake of the argument this function will _not_ be parsed by DaCe
# but triggers a callback
 def callback(the_obj):
       the_obj.a_data_to_access # <--- error, we get a dace.Array instead of a MyObj_Is_An_Array

@dace.program
def a_program(the_obj):
    ...
    callback(the_obj)

the_obj = MyObj_Is_An_Array()
a_program(the_obj)

Solution Marshall the true type of Python object in the callback system.

(Also we would need it to keep working with literal list, e.g. callback([obj, obj, obj])

Motivation

The Pace code makes use of callback to access an optimized halo exchange system. Of course we should ultimately try and have the halo exchange be DaCe readable, but in the meantime callbacks are an effective (and used) way to reach working system in large codebase.

FlorianDeconinck commented 1 year ago

Addendum: the type of the object might be a np.array and not a dace.Array, would have to be tested.

Issue with this feature: if callback(myObj_Is_An_Array) is straightforward, what to be done with callback(myObj_Is_An_Array[10:24]) ?

alexnick83 commented 1 year ago

I suggest the following solution:

import ctypes
import dace
import numpy as np

def dace_blocker(f):
    return f

class MyObj_Is_An_Array(np.ndarray):

    def __init__(self, shape, dtype):

        self.a_data_to_access = 42
        self.shape=shape
        self.dtype=dtype

        super(MyObj_Is_An_Array, self).__init__()

    def __descriptor__(self):
        return dace.data.Array(dace.typeclass(self.dtype.type),
                               self.shape,
                               strides=tuple(s // self.itemsize for s in self.strides))

@dace_blocker
def callback(obj, obj_id):
    print(f'obj has type {type(obj)} and value {obj}')   # obj is numpy.ndarray
    actual_obj = ctypes.cast(obj_id, ctypes.py_object).value
    print(f'actual_obj has type {type(actual_obj)} and value {actual_obj}')  # actual_obj is MyObj_Is_An_Array
    print(f'actual_obj.a_data_to_access has type {type(actual_obj.a_data_to_access)} and value {actual_obj.a_data_to_access}')

@dace.program
def a_program(obj, obj_id):
    obj[:] = 5
    callback(obj, obj_id)

the_obj = MyObj_Is_An_Array(shape=[1], dtype=np.int32)
a_program(the_obj, id(the_obj))
alexnick83 commented 1 year ago

Regarding passing a slice of the object, then obj will be the slice, while actual_obj will be the whole array (and will still have access to the custom attributes):

@dace.program
def a_program(obj, obj_id):
    obj[:] = 5
    callback(obj[2:4], obj_id)

the_obj = MyObj_Is_An_Array(shape=[5], dtype=np.int32)
a_program(the_obj, id(the_obj))

returns:

obj has type <class 'numpy.ndarray'> and value [5 5]
actual_obj has type <class '__main__.MyObj_Is_An_Array'> and value [5 5 5 5 5]
actual_obj.a_data_to_access has type <class 'int'> and value 42
alexnick83 commented 1 year ago

A modified program passing lists of objects (and lists of their IDs):

@dace_blocker
def callback(obj_list, obj_id_list):
    print(f'obj_list has type {type(obj_list)} and value {obj_list}')
    print(f'obj_id_list has type {type(obj_id_list)} and value {obj_id_list}')
    for obj_id in obj_id_list:
        actual_obj = ctypes.cast(obj_id, ctypes.py_object).value
        print(f'actual_obj has type {type(actual_obj)} and value {actual_obj}')
        print(f'actual_obj.a_data_to_access has type {type(actual_obj.a_data_to_access)} and value {actual_obj.a_data_to_access}')

@dace.program
def a_program(obj, obj_id):
    obj[:] = 5
    callback([obj, obj, obj], [obj_id, obj_id, obj_id])

the_obj = MyObj_Is_An_Array(shape=[5], dtype=np.int32)
a_program(the_obj, id(the_obj))

returns:

obj_list has type <class 'list'> and value [array([5, 5, 5, 5, 5], dtype=int32), array([5, 5, 5, 5, 5], dtype=int32), array([5, 5, 5, 5, 5], dtype=int32)]
obj_id_list has type <class 'list'> and value [139645202514256, 139645202514256, 139645202514256]
actual_obj has type <class '__main__.MyObj_Is_An_Array'> and value [5 5 5 5 5]
actual_obj.a_data_to_access has type <class 'int'> and value 42
actual_obj has type <class '__main__.MyObj_Is_An_Array'> and value [5 5 5 5 5]
actual_obj.a_data_to_access has type <class 'int'> and value 42
actual_obj has type <class '__main__.MyObj_Is_An_Array'> and value [5 5 5 5 5]
actual_obj.a_data_to_access has type <class 'int'> and value 42