pgiri / dispy

Distributed and Parallel Computing Framework with / for Python
https://dispy.org
Other
266 stars 55 forks source link

Error connecting two different laptops via WiFi #198

Open jaygala24 opened 5 years ago

jaygala24 commented 5 years ago

We are working on a project related to distributed computing using Python. We have set up dispy 4.11.0 and pycos on 2 laptops running Windows 10 (python 3.6). We tested it using the sample code provided on the Sourceforge.net documentation page. If we use two terminals on the same laptop then sample.py runs without errors. Then we tried using both laptops connected to the same WiFi network. We started the dispy server on the 1st laptop using -i address (address being ip address of that laptop) and then tried running sample.py on the other laptop (we added nodes=['address of laptop 1'] and ip_addr="address of laptop 2(client)" in sample.py) then sample.py stuck in an

infinite loop which we feel is because it is not detecting the dispynode (laptop 1 ) on our WiFi network. I have attached the sample.py which I have used 192.168.1.106 : Dispynode server (runs dispynode.py) 162.168.1.107 : Dispy Client (runs sample.py)

On running sample.py on laptop 2:

2019-11-02 20:42:32 pycos - version 4.8.11 with epoll I/O notifier 2019-11-02 20:42:32 dispy - dispy client version: 4.11.0

NOTE: Using dispy port 61590 (was 51347 in earlier versions)

2019-11-02 20:42:32 dispy - Storing fault recovery information in "_dispy_20191102204232" 2019-11-02 20:42:32 dispy - dispy client at 192.168.1.107:61590

Also, one more issue which we faced was on running dispyadmin.py we got a Value error : Traceback (most recent call last): File "dispyadmin.py", line 52, in class DispyAdminServer(object): File "dispyadmin.py", line 403, in DispyAdminServer info_port=int(dispy.config.ClientPort), node_port=int(dispy.config.NodePort), ValueError: invalid literal for int() with base 10: 'dispy.config.DispyPort'

More importantly we need help about the first issue that is : our dispy client is not detecting the dispynode on the same WiFi network. It would be great if you could help us a little in identifying the issue and suggest the way how we could go about this issue.

// sample.py
import logging
# simple program that distributes 'compute' function' to each node running 'dispynode'
def compute(n):
    import time
    time.sleep(n)
    # dispy_node_name is name of server where this computation is being executed
    return (dispy_node_name, n)

if __name__ == '__main__':
    import dispy, random
    cluster = dispy.JobCluster(compute, nodes="192.168.1.*", ip_addr="192.168.1.107", loglevel=logging.DEBUG)
    jobs = []
    for i in range(10):
        # schedule execution of 'compute' on a node (running 'dispynode')
        # with a parameter (random number in this case)
        job = cluster.submit(random.randint(5,20))
        jobs.append(job)
    # cluster.wait() # wait for all scheduled jobs to finish
    for job in jobs:
        host, n = job() # waits for job to finish and returns results
        print('%s (%s) executed job %s at %s with %s' % (host, job.ip_addr, job.id,
                                                         job.start_time, n))
        # other fields of 'job' that may be useful:
        # print(job.stdout, job.stderr, job.exception, job.ip_addr, job.start_time, job.end_time)
    cluster.print_status()
pgiri commented 5 years ago

With WiFi, UDP (which is used for discovering nodes) is more unreliable (than with wired network). If you use nodes=["192.168.1.21", "node2"] etc., with explicitly listing names / IP addresses of all the nodes, dispy will use TCP that should detect nodes. Note that in the above example, nodes is given as string, which will not work.

Issue with dispyadmin has been fixed in current github master.

jaygala24 commented 4 years ago

Thanks for your help with the above issue but we have one more query. Can you give the info regarding the wall time and total time like what exactly it conveys?