pgiri / dispy

Distributed and Parallel Computing Framework with / for Python
https://dispy.org
Other
266 stars 55 forks source link

class method as computation function #135

Open gobbedy opened 6 years ago

gobbedy commented 6 years ago

Hello again,

I have hierarchical code, and I would like my computation function to be a class method.

Does dispy support this?

I modified this example to show what I mean:


import random, dispy

class C(object):
    def __init__(self, n):
        self.n = n

    def show(self):
        print('%s: %.2f' % (self.i, self.n))

    def compute(self, i):
        # obj is an instance of C
        import time
        self.i = i
        time.sleep(self.n)
        self.show()  # the output is stored in job.stdout
        return self.n

    def distribute(self):

        # 'compute' needs definition of class C, so include in 'depends'
        cluster = dispy.JobCluster(self.compute, depends=[C], nodes=['node1', 'node2', ])
        jobs = []
        for i in range(10):
            job = cluster.submit(i) # it is sent to a node for executing 'compute'
            job.id = i # store this object for later use
            jobs.append(job)
        for job in jobs:
            job() # wait for job to finish
            print('%s: %.2f / %s' % (job.id, job.result, job.stdout))

if __name__ == '__main__':
    c = C(random.uniform(1, 3)) # create object of C
    c.distribute()

Unfortunately my modified example errors out:

Traceback (most recent call last): File "dispy_example.py", line 106, in c.distribute() File "dispy_example.py", line 95, in distribute nodes=['node1', 'node2', ]) File "/home/gobbedy/.local/lib/python3.6/site-packages/dispy/init.py", line 2447, in init raise Exception('Invalid computation type: %s' % type(computation)) Exception: Invalid computation type: <class 'method'>

pgiri commented 6 years ago

I just committed support for object methods as computation. However, creating objects at the client and sending them to nodes is not always possible (objects must be serializable) and possibly inefficient (if objects are large). It may be easier to build objects at the node (in the computation or in setup function).

Note also that you should call cluster.submit with object as first argument, i.e., as job = cluster.submit(obj, i) and possibly you want to write "distribute" not as class method, but global function perhaps? e.g., as

def compute(obj):
...
for i in range(10):
    job = cluster.submit(obj, i)
....
gobbedy commented 6 years ago

@pgiri wow thanks for the fast turnaround. I've never installed a python module from source but I'll figure out how to grab your latest code tomorrow and will try the new flow.

Your point that sending objects may be inefficient if they are large is a relevant one, as my real code has very large arrays.

However, I'm not sure how I will avoid sending large amounts of data to the nodes since the arrays are shared (read only) and every node will have access them. Wherever the arrays are created, they will have to be sent to the nodes.

I was hoping I could do that using the setup function (which you mentioned), while the "original" copy is held in the client. But then I'm not sure what that means for sending the class/object method.

This is all very theoretical at this point since I haven't tried it yet, but will do in the coming days.

For now I'll close this bug since you more than answered my question!

gobbedy commented 6 years ago

Hi @pgiri I'm sorry for getting back to you so late. I took a vacation and I am only getting back to this now.

I installed the latest module with your added support for object methods. It worked smoothly!

But now I'm trying to use static methods and it fails. I created an example.


import random, dispy

class C(object):
    def __init__(self, n):
        self.n = n

    def show(self):
        print('%s: %.2f' % (self.i, self.n))

    @staticmethod
    def compute(obj):
        # obj is an instance of C
        import time
        time.sleep(obj.n)
        obj.show()  # the output is stored in job.stdout
        return obj.n

    def distribute(self):

        # 'compute' needs definition of class C, so include in 'depends'
        cluster = dispy.JobCluster(C.compute, depends=[C], nodes=['node1',])
        jobs = []
        for i in range(10):
            self.i = i
            job = cluster.submit(self) # it is sent to a node for executing 'compute'
            job.id = i # store this object for later use
            jobs.append(job)
        for job in jobs:
            job() # wait for job to finish
            print('%s: %.2f / %s' % (job.id, job.result, job.stdout))

if __name__ == '__main__':
    c = C(random.uniform(1, 3)) # create object of C
    c.distribute()

Would it be possible to add support for static compute methods?