samirelanduk / atomium

Python macromolecular parsing (with .pdb/.cif/.mmtf parsing and production)
https://atomium.bio
MIT License
102 stars 19 forks source link

Get PDB files from remote machines #16

Closed gf712 closed 6 years ago

gf712 commented 6 years ago

Description of Feature

Hey, I was wondering if you're considered implementing a helper function to fetch data with ssh. This could be done easily with a helper function that accepts file objects, because there is a library that provides the same python file API to handle files in a remote machine. This could be really useful in our lab because we mirror pdb files and accessing these files is quicker than downloading them from RCSB. However, these are in a remote machine and cannot just be opened with open from the python standard library.

Proposed Example Code

The implementation could be something like this:

def fetch_io(file):
    filestring = file.readlines()
    return pdb_string_to_pdb_dict(filestring)

And then maybe add a check to see if the file is open? And then the user is responsible for closing it.

This could be used like this

import paramiko

# create the remote file object to handle files in a remote machine
ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect('remote-machine', username='user123')

# and then have a function that accepts this object
pdb = fetch_io(ssh)

And could accept file objects in general, so:

with open('my_pdb.pdb', 'r') as f:
  pdb = fetch_io(f)
samirelanduk commented 6 years ago

Pretty great idea!

I've written a solution which implements this partly - it will let you grab a PDB over SSH with a pdb_over_ssh() function that works similar to fetch. It doesn't take an SSH connection as an argument as you suggest - it creates and closes an SSH connection everytime the function is called. I might implement that fully some day.

Is there any chance you could try this out to see if it's useful before I release it? You can install the new commit using pip I believe, using pip install git+https://github.com/samirelanduk/atomium.git@0.10.2.

You can do it in two ways. You can do atomium.pdb_over_ssh('hostname', 'username', '/path/to/pdb/on/remote/') - this will use your local private key to SSH into the remote machine. Or, to use a password instead of a key, do atomium.pdb_over_ssh('hostname', 'username', '/path/to/pdb/on/remote/', password='password').

Hopefully this works! I've gotten it to work for me but it's hard to write tests for!

gf712 commented 6 years ago

Hey, I just tried it out and seems to work fine! On my machine it also goes a bit faster than fetch, which is great!

As for writing tests, I am not quite sure how to do it..