uqfoundation / ppft

distributed and parallel Python
http://ppft.rtfd.io
Other
67 stars 14 forks source link

cannot run on remote server, and runs on local machine when it shouldn't #15

Closed gobbedy closed 6 years ago

gobbedy commented 6 years ago

Hello,

I'm trying to use ppft for the first time and it doesn't behave as expected. I'm hoping you can provide some guidance.

I'm running on a linux cluster with (for now) 2 nodes. I'll call my client my_client, and my remote server my_server.

On my_server I run this:
python -m ppft.server

On my_client I run this python script:


import pp

def print_platform():
    import platform
    print("name of remote machine: "platform.node())

ppservers=("my_server",)
job_server = pp.Server(ppservers=ppservers)
f1 = job_server.submit(print_platform)
r1 = f1()

Since print_platform is executed on the server, I expect it to print my_server

Instead, it prints my_client

That means it a) executed on the client when it's not even in my ppservers list, and b) did not execute on the server, which is on the list.

Am I missing something to make it behave as expected?

mmckerns commented 6 years ago

The list of resources, by default, includes the localhost. The way to not run locally is to set the number of cpus to zero. Then the internal scheduler will run only using the available remote servers.

gobbedy commented 6 years ago

@mmckerns super, that worked!

I did have to specify a port, orelse it just hung. Is this normal? According to the quick start guide it does not appear to be required.

For future reference if anyone else has the same issue, here is my flow that works (basically a copy of my first post, with the fixes).

I'm running on a linux cluster with (for now) 2 nodes. I'll call my client my_client, and my remote server my_server.

On my_server I run this:
python -m ppft.server -p 1234 (where 1234 is an arbitrarily chosen port)

On my_client I run this python script:


import pp

def print_platform():
    import platform
    print("name of remote machine: "platform.node())

ppservers=("my_server:1234",)
job_server = pp.Server(ncpus=0, ppservers=ppservers)
f1 = job_server.submit(print_platform)
r1 = f1()

As expected (and contrary to my first post), this prints my_server

mmckerns commented 6 years ago

I think, if I remember correctly, that it is necessary to specify a port... and it can (or will) hang if you don't. I believe it's supposed to autodetect the ports that ppservers are connected to... but I don't remember ever using it without specifying a port (due to it hanging). pathos has a portpicker, and I probably could leverage that code to ensure that an open port is selected...

Is the above a satisfactory answer? Then close this issue.

gobbedy commented 6 years ago

Hi @mmckerns when delivering a product if following a tutorial does not work, it's not satisfactory. You need to have documentation that matches functionality. If not, no matter how powerful the underlying code, it won't meet the users' needs.

That said, your answer is more than satisfactory for me since it works, and I'm grateful for your quick response -- saves me a lot of hacking and looking at source code.

I would usually leave this open as the need for correct documentation of basic features strikes me as a critical issue, but I will respect your request to close it.

mmckerns commented 6 years ago

The documentation you are referring to is not mine. I have no control over what it says. Note that the tutorial is for pp, and the module on this site is ppft -- which is a fork of the original.

gobbedy commented 6 years ago

@mmckerns my starting point was your github readme https://github.com/uqfoundation/ppft

There I found the python -m ppft.server command (with no "-p") and a reference to parallelpython.com for further information.

In the absence of other documentation I went to the latter page to find out how to run the tool (that's where I found out about "-p")

If fork has no documentation then the user can only rely on parallelpython.com to make the tool work.

Either way basic functionality is either documented incorrectly or not documented at all.

I don't want to sound overly critical -- I think your code/tool is great, orelse I wouldn't use it. You're also one of the few developers that actively maintains code and responds to users.

It's just that great code without documentation is like a ferrari without a key: exciting, very powerful, but ultimately very hard to use.

mmckerns commented 6 years ago

I was responding to your previous point that I should change the referenced documentation. As I said previously, I can't. On the other hand, the ppft code is fairly well documented, and the issue is not that there's no documentation... it's that either the documentation for the ppserver is incorrect with regard to remote ports, or there's a bug in the code. If you'd like to contribute to the documentation effort, please feel free to do so, or alternately, help fix any bugs. You'll note that there are open tickets for (a) the missing port number https://github.com/uqfoundation/ppft/issues/3, and (b) the lack of fork-specifc ppft documentation https://github.com/uqfoundation/ppft/issues/14.

gobbedy commented 6 years ago

@mmckerns fair enough. Again thanks for the great pathos package.