pgiri / dispy

Distributed and Parallel Computing Framework with / for Python
https://dispy.org
Other
266 stars 55 forks source link

cluster.submit is not able to transfer the files to server node #210

Open Kekushke opened 4 years ago

Kekushke commented 4 years ago

cluster.submit using dispy_job_depends parameter is not able to transfer the files to server. Although the log shows that the files have gone through, the files do not appear on the server. Large files and delays were inserted to assure that the job would persist to confirm the problem. Also, directories were set up in multiple configurations.

I may be missing something simple... since I am also unable to print on to the dispynode logs from compute script tied into the JobCluster. Thx

Kekushke commented 4 years ago

To give more information, the general structure is like this. The files 'a.txt' and 'b.txt' do not arrive on the cluster nodes, and print statement are not printing. I would prefer dispynode to print them. Also tried `dispy.logger'. Am I missing basic something? Thank you.

import dispy
C=None  #global

def setup_cluster():
    cobj=dispy.JobCluster(compute,callback=callback):
    return cobj

def callback():
     # callback here

def compute():
     # work with a.txt b.txt
     print ("in compute")

def submit_job(id)
    global C
    alpha=1
    beta=2
    if not C:
       C=setup_cluster():
    depends=['a.txt','b.txt']
    C.submit(id,alpha,beta,dispy_job_depends=depends)

# main
submit_job(1)
submit_job(2)
submit_job(3)
pgiri commented 4 years ago

File transfers should work fine; I just tested 4.12.2. You can simplify your example (e.g., try 'sample.py' with dispy_job_depends with a file and in compute check if os.path.exists("a.txt")). Note that the files are transferred only when job is sent to a node and as soon as that job is finished, the files are removed; i.e., unlike with depends that persist during whole computation (all jobs), job depends are available during that job's execution.

Kekushke commented 4 years ago

Most likely I had different versions running on different nodes, and that probably that caused the problem. I updated everything from scratch and all codes worked like charm. Please mark this resolved. This project is really awesome! Thank you!!