get/put/read_text/read_bytes/write_text/write_bytes/mkdir

mristin commented 6 years ago

I implemented the functions get and put (file upload/download) and read_text, read_bytes, write_text, write_bytes and mkdir to mimic pathlib.Path wrapped around spur.SshShell to allow simpler manipulation of remote files.

Would you be interested if I integrated them into spur and made a pull request?

For input arguments, I always used a union of str and pathlib.Path.

P.S. if you don't want this functionality in spur, would you mind if I created a pypi package spurplus and added a spurplus.SshShell which would inherit from spur.SshShell?

mwilliamson commented 6 years ago

I'd be interested in seeing the implementation. I'd hope that those use-cases were fairly easy to implement without any additional functions e.g.

def read_bytes(shell, path):
    with shell.open(path, "rb") as fileobj:
        return fileobj.read()

so my inclination is to avoid putting extra complexity into the API.

P.S. if you don't want this functionality in spur, would you mind if I created a pypi package spurplus and added a spurplus.SshShell which would inherit from spur.SshShell?

Go ahead!

mristin commented 6 years ago

It's not much more complex than that; mostly some wrapping and more informative error messages.

Let me create the module spurplus, so you can keep the spur module as simple as possible, while all the extra complexity and wrapping goes to spurplus. Once it's up, you can have a look and decide if you want to merge some of the functions back to spur and I'd remove them from spurplus.

mristin commented 6 years ago

Hi Michael,

I wrote the first draft of spur+. Could you please have a look and tell me what you think: https://bitbucket.org/parqueryopen/spurplus/pull-requests/1/initial-version/diff ?

I suppose the main benefit of spur+, the way I see it now, is typing and pathlib.Path support (which we both use extensively in our code base) & some minor helpers like md5 and mkdir.

The reason why I introduced check_call and check_output was also to ensure type safety (since allow_error=True introduces type ambiguity which would be a bit bothersome to resolve every time we call a remote command, and we do that often).

I have a few open questions:

what would be the best way to approach documentation in your opinion? Should I duplicate spur documentation or reference it somehow (if so, do you know what would be the most intuitive way to reference it)?
I open a sftp client and close it after every operation -- do you know if sftp client was meant to be used that way or should I use some kind of pool?
I couldn't figure out the type of shell_type from SshShell constructor -- do you know what it is?
Could you please check if I got the return types of run() and spawn() correctly?

Thanks a lot for your help in advance!

mwilliamson commented 6 years ago

The reason why I introduced check_call and check_output was also to ensure type safety (since allow_error=True introduces type ambiguity which would be a bit bothersome to resolve every time we call a remote command, and we do that often).

I haven't played around with static typing in python -- how does allow_error=True introduce type ambiguity?

what would be the best way to approach documentation in your opinion? Should I duplicate spur documentation or reference it somehow (if so, do you know what would be the most intuitive way to reference it)?

If you want to reference the docs for Spur, the easiest way would probably be a link with a relevant fragment (if you're referencing a specific part). It's probably best to make sure you're pointing at a specific version.

I open a sftp client and close it after every operation -- do you know if sftp client was meant to be used that way or should I use some kind of pool?

As in a Paramiko SFTP client? Their docs are the best place to check, I don't know off the top of my head.

I couldn't figure out the type of shell_type from SshShell constructor -- do you know what it is?

It needs a Boolean attribute supports_which, and a method generate_run_command() -- looking at the specific implementations should help you work out what the exact argument and return types are.

Could you please check if I got the return types of run() and spawn() correctly?

I'm not familiar with the typing rules, but I'd expect the return type of run() to be spur.results.ExecutionResult, since spur.results.RunProcessError gets raised, not returned.

mristin commented 6 years ago

Thanks a lot for the answers, I appreciate them!

I haven't played around with static typing in python -- how does allow_error=True introduce type ambiguity?

Sorry, I was tired and misread the code. It is actually raise, not return in results.result() function. There is no ambiguity, the type is always spur.results.ExecutionResult.

Regarding the paramiko's SFTP client: I have read http://docs.paramiko.org/en/2.4/api/sftp.html (and some blogs on internet), but I was not able to find out about the expected longevity of the sftp client.

In your code, you create a SFTP client whenever you call SshShell.open() in spur/ssh.py, line 251:

class SshShell(object)

    # line 251
    def open(self, name, mode="r"):
        sftp = self._open_sftp_client()
        sftp_file = SftpFile(sftp, sftp.open(name, mode), mode)
        # ...
        return sftp_file

    # line 299
    def _open_sftp_client(self):
        return self._get_ssh_transport().open_sftp_client()

Since we never observed any problems with the performance, I suppose it's OK to always create a new SFTP client on every operation instead of keeping one connected all the time tying down the resources. Do you happen to remember -- was there any particular reason why you opened it on every open() and did not create it at the initialization of SshClient?

Regarding the documentation: I'll fix the version of spur dependency to spur==0.3.20 and copy the documentation starting by stating a reference to the original spur function.

mwilliamson commented 6 years ago

Since we never observed any problems with the performance, I suppose it's OK to always create a new SFTP client on every operation instead of keeping one connected all the time tying down the resources. Do you happen to remember -- was there any particular reason why you opened it on every open() and did not create it at the initialization of SshClient?

I have no idea, I'm afraid -- I wrote it quite a long time ago.

mristin commented 6 years ago

Hi Michael,

Thanks again for all the help. If you want, you can have a look at the final implementation: https://bitbucket.org/parqueryopen/spurplus/pull-requests/1/initial-version/diff

Please let me know if you have any comments, ideas etc. I would be very grateful if you could review the README and see if that's OK with you what I wrote there. Is it OK if I use MIT license or should I use BSD-2?

I did a small benchmark to measure the overhead of opening/closing SFTP client on each operation. It's substantial (25x when using SSH server which runs on the same machine).

As soon as you give me a go, I'd publish it to pypi.

mwilliamson commented 6 years ago

I assume you have spur as a dependency rather than copying the code? If so, then you can use whatever licence you like.

mristin commented 6 years ago

Yes, it is only a dependency (apart from copy/pasting parts of the method documentation with clear references to spur.py).

Is there nothing to be added to the readme? Should we then publish it to pypi?

mristin commented 6 years ago

Assuming you have no comments regarding the readme, I published the library to Pypi: https://pypi.org/project/spurplus/ https://bitbucket.org/parqueryopen/spurplus/

I would be very thankful if you could put a link to spurplus somewhere in your docs.

mwilliamson / spur.py

get/put/read_text/read_bytes/write_text/write_bytes/mkdir #77