insilichem / pychimera

Use UCSF Chimera Python API in a standard interpreter
http://pychimera.readthedocs.io
GNU Lesser General Public License v3.0
57 stars 10 forks source link

Loading chimera module #6

Closed jrjhealey closed 7 years ago

jrjhealey commented 8 years ago

Hi Jaime, thanks for your help so far over on the Chimera users forum. I've decided to try and implement what I wished to do via pychimera as you suggested but I think I might be in need of some help to get it going!

wms_joe@Esau:~/dev/strucfit$ pychimera strucfit.py
No module named chimeraInit
ERROR: Chimera could not be loaded!

I'm trying to load the modules and patch the environment as you say but at coming up against the above. I'm guessing my syntax for the import or the patch is wrong somewhere.

I've exported the environment variable for the headless chimera bin already.

export CHIMERADIR="/home/wms_joe/Applications/CHIMERAHL1.11/bin/"

The script I'm working on is below, I'm pretty new to python and it's still very rough (and mostly probably non functional), but I include it in case you can point out the exact issue...

#!/usr/bin/python

# This script pulls in homologs of proteins from PDB
# as determined by HHSuite. It then employs UCSF Chimera to
# structurally match them and get an indication of how well
# they score (RMSD) in order to pick the best simulation.

from __future__ import print_function
import os
import subprocess
import sys
import argparse
import traceback
import warnings

import pychimera

from glob import glob

os.environ['CHIMERADIR'] = '/home/wms_joe/Application/CHIMERAHL1.11/bin'
pychimera.patch_environ()
pychimera.enable_chimera()
import chimera

from chimera import runCommand as rc

# Step 1, parse the output of HHpred to get the nearest homolog.

def main():

    # Acquire HHPred Database

    db_cmd = 'find ~/Applications/HHSuite/databases/pdb70 -type f -name "pdb70_hhm.ffdata"'
    db_path = subprocess.Popen(
        db_cmd,\
        shell=True,\
        stdin=subprocess.PIPE,\
        stdout=subprocess.PIPE,\
        stderr=subprocess.PIPE)

    (stdout,stderr) = db_path.communicate()
    filelist = stdout.decode().split()
    db = stdout
    # Parse command line arguments
    def_cpus = 12
    try:
        parser = argparse.ArgumentParser(description='This script compares protein structural homologs as determined with HHpred, to PDB models using UCSF CHIMERA, to gather metrics of structural similarity.')

        parser.add_argument(
            '-d',\
            '--database',\
            action='store',
            default=db,\
            help='You can specify a different HHpred database (filepath) to use if you offer this parameter. Otherwise it defaults to PDB.')

        parser.add_argument(
            '-f',\
            '--fasta',\
            action='store',\
            help='The fasta amino acid sequence that corresponds to the simulated structure and the sequence you wish to query.')

        parser.add_argument(
            '-s',\
            '--simulations',\
            action='store',\
            help='The directory where the protein structure simulations are stored. They must be in PDB format and be named logically.')

        parser.add_argument(
            '-r',\
            '--rmsd',\
            action='store',\
            help='The file to store the RMSD values for each matching.')

        parser.add_argument(
            '-t',\
            '--threads',\
            action='store',\
            default=def_cpus,\
            help='The number of threads that HHsearch can execute on.')

        args = parser.parse_args()

    except:
        print("An exception occured with argument parsing. Check your provided options.")
        traceback.print_exc()

    print(args)

    print("Database in use is:")
    print(args.database)

    if args.fasta is not None:
        split = os.path.splitext(args.fasta)
        basename = os.path.basename(split[0])
    else:
        print('No input fasta was provided. Check your input parameters')
        sys.exit(0)

    # Run the HHsearch
    if args.fasta is not None and args.database is not None:
        hhresult_file = '{0}.hhr.fasta'.format(basename)
        search_cmd = 'hhsearch -cpu {0} -i {1} -d {2} -dbstrlen 50 -B 1 -b 1 -p 60 -Z 1 -E 1E-03 -nocons -nopred -nodssp -Ofas {3}'.format(args.threads,args.fasta, args.database, hhresult_file)
        print("Executing HHsearch with the command:")
        print(search_cmd)
        subprocess.Popen(
            search_cmd,\
            shell=True,\
            stdin=subprocess.PIPE,\
            stdout=subprocess.PIPE,\
            stderr=subprocess.PIPE)
    else:
        print("No fasta or database has been detected. Check your input parameters.")
        sys.exit(1)

    # Acquire the best hit
    hhresult_handle = open(hhresult_file,"r")
    hhresult_lines = hresult_handle.readlines()
    hhresult_handle.close()
    fasta_headerlist = []

    if hhresult_line.startswith("#"):
        pass
    elif hhresult_line.startswith(">"):
        headerline = hhresult_line
        fasta_headerlist.append(headerline)

    best_header = fasta_headerlist[1] # HHpred puts the query seq in index 0, so 1 is your top hit.
    best_PDB = best_header.replace(">", "")
    print("Your best hit was the PDB id:")
    print(best_hit)

    # Acquire the protein simulations
#   model_list = []
#   for files in os.walk(args.simulations):
#       for f in files:
#           if f.endswith('.pdb') and f.startswith(basename):
#               file_paths.append(os.path.abspath(os.path.join(folder,filename)))

    # Open all models etc. in Chimera, and prepare to calculate RMSD
#   print(model_list)
    # Get reference structure from PDB
#   chimera.open(best_PDB)
    # Open model structures
#   for model in model_list:
#       chimera.openModels.open(model,"type=PDB")

    # Calculate RMSD for each model vs the identified reference.
#   print(model + \t + best_pdb + \t + \t + rmsd)
jaimergp commented 8 years ago

Hi Joe!

I don't know if you are running this from a Mac. I've run through several issues with Macs during the development so, as of now, is currently unsupported. I can only guarantee success when running from Linux...

That said, we can try! A few things:

1 - If you're running the script with the pychimera binary, you don't need to call pychimera.patch_environ() nor pychimera.enable_chimera(). That's handled by the binary! However, you do need them if you're using the standard python binary. Bottomline:

$ python yourscript.py # This requires the pychimera calls inside the script
$ pychimera yourscript.py # PyChimera must NOT be called now!

Since you are using argparse for your own needs, I recommend using the python call. This is, don't change anything in your script, but call it with python (2.7).

2 - The envvar should point to the main chimera directory. This is, the one you get when running chimera --root. Just remove the bin/ out of it and it will be fine.

3 - The error you are getting means that the module chimeraInit.py could not be found in the path. This is most probably due to the previous point. Anyway, if that's not the case, pychimera.patch_environ() deals with this, and should include the CHIMERADIR/share location in it. To check if that's right, please, include the following line before AND after the pychimera calls:

print(sys.path)
jrjhealey commented 8 years ago

Thanks Jaime, this is a huge help! You're saving a considerable chunk of one of my thesis chapters ;)

I am running a Mac but all of the script execution is actually on a Linux (Ubuntu) server I'm SSHd to, so everything will work I hope?!

I see the distinction! I'll make the amendments you suggest when I'm back at my computer and see how I get on.

jaimergp commented 8 years ago

Yep, as long as the host running the script is Linux, you're fine. I've checked several Ubuntu installations and they all run fine. I think that the main issue you have is the CHIMERADIR envvar. Once that's solved, I think it will run OK ;)

I am also doing a PhD, so I know how you feel. Let me know if you need anything else, especially Python related! If you host it in a gist, I can fork it and PR the changes, since I am seeing a couple of issues at first sight.

jrjhealey commented 7 years ago

Hi Jaime,

Thanks once again for your help here and over on Chimera users. I finished the script (though there are certainly still some bugs in it!). I may do as you say and put together a gist if I get around to it (I've posted a pastebin link over on the users forum in the meantime anyway if you're interested).

Pychimera is excellent (and I followed your suggestion of installing everything inside a miniconda environment which worked flawlessly).

Feel free to close this as I'm aware it's not really an 'issue'!

Thanks again,

Joe

jaimergp commented 7 years ago

It's a pleasure to see my coding efforts being helpful to others, so thanks to you!