madscatt / zazmol

Library for defining molecular objects to create simulation and analysis programs To install: python setup.py install dependencies: numpy, mocker
GNU General Public License v3.0
0 stars 2 forks source link

Think through and refactor `merge_two_molecules` method #21

Open StevenCHowell opened 7 years ago

StevenCHowell commented 7 years ago

As mentioned in issue #6, though should be given to merge_two_molecules. This likely need refactoring.

madscatt commented 7 years ago

@stvn66 : so I wrote a @classmethod to copy a molecule using a mask that has less detail and loops over attributes in an abstract way. We can use something like this to merge two molecules, but I am not sure if it is any easier as one will always have to keep a list of attributes to copy.

Feel free to continue the discussion.

class Copy_Using_Mask():

    @classmethod
    def from_sasmol(class_instance,  mask, **kwargs):
        ''' 

        Parameters
        __________
        class_instance
            system object

        mask
            integer list

        kwargs
            optional future keyword arguments

        Returns
        _______
            molecule
                new system object

        Examples
        --------
            >>> import sasmol.system as system 
            >>> import sasmol.util as utilties 
            >>> molecule = system.Molecule('hiv1_gag.pdb')
            >>> molecule.name()[0] 
            'N'
            >>> molecule.name()[4] 
            'CA'
            >>> molecule.name()[14] 
            'HB1'
            >>> mask = [0, 4, 14]
            >>> new_molecule = utilities.Copy_Using_Mask.from_sasmol(molecule, mask) 
            >>> new_molecule.name()
            ['N', 'CA', 'HB1']
            >>> new_molecule.mass()
            array([ 14.00672 ,  12.01078 ,   1.007947])

        Note
        ____

            if more attributes are added in system.Atom() then the key lists below need
                to be updated

            currently only list_keys and numpy_keys are returned

            int_keys would have to be recalculated based on mask

            short_keys would have to be re-initialized (init_children)

            may want to consider passing a list of specific attributes to extract if memory is an issue

            copies all frames (untested)

            why can't this method be in subset?

        ''' 

        list_keys = ['_residue_flag', '_occupancy', '_charge', '_atom', '_chain', '_segname', '_beta', '_loc', '_element', '_name', '_rescode', '_moltype', '_resname']

        numpy_keys = ['_original_index', '_original_resid', '_index', '_resid', '_mass', '_coor'] 

        short_keys = ['_resnames', '_resids', '_elements', '_segnames', '_betas', '_names', '_moltypes', '_occupancies' ]      

        int_keys = ['_number_of_chains', '_number_of_betas', '_number_of_resids', '_number_of_names', '_number_of_moltypes', '_number_of_resnames', '_number_of_segnames', '_number_of_elements', '_id', '_number_of_occupancies' ]  

        other_keys = ['_header', '_conect', '_debug']  

        new_dict = {}

        number_of_frames = len(class_instance.coor())
        mask_length = len(mask) 

        natoms = class_instance._natoms
        all_data = [[] for x in xrange(natoms)]
        for i in xrange(natoms):
            if i in mask:
                count = 0
                for key, value in class_instance.__dict__.iteritems():
                    if key in list_keys:
                        all_data[count].append(value[i])
                        count += 1
        count = 0

        for key, value in class_instance.__dict__.iteritems():
            if key in list_keys:
                new_dict[key] = all_data[count]
                count += 1

        molecule = system.Molecule()
        molecule.__dict__ = new_dict  

        molecule.setId(0)

        molecule.setNatoms(mask_length) 

        molecule.setOriginal_index(numpy.take(class_instance.original_index(),mask))        
        molecule.setOriginal_resid(numpy.take(class_instance.original_resid(),mask))        
        molecule.setIndex(numpy.take(class_instance.index(),mask))        
        molecule.setResid(numpy.take(class_instance.resid(),mask))

        molecule.setMass(numpy.take(class_instance.mass(),mask))        

        molecule.setCoor(numpy.take(class_instance.coor(),mask,axis=1))

        return molecule
StevenCHowell commented 7 years ago

Usually classes are numbed using just CamalCase, not both Camal_Case_And_Underscores. So this would be, CopyUsingMask.

StevenCHowell commented 7 years ago

My primary interest is making larger molecules. I'm not sure the details of how this differs from the old copy_mol_using_mask. I'm familiar with using that to get sub molecules of an original molecule. I never saw a method for building a large molecule from subunits in sasmol. Does this do that? Could I combine several repeated subunits, after transforming the coordinates for each, to make a large conglomerate molecule?

madscatt commented 7 years ago

Hi, I wrote this experimental class that will allow you to merge one molecule into another pretty simply.

Trying to think about defining some "essential" instance attributes that if defined make a molecule "whole" (i.e. you can write a PDB with it). I am "struggling" with all of the default instance variables in the Atom() class in sasmol (not the test bit below). They have evolved since SASSIE needed such abilities to handle & parse all of the information.

Comments are welcome.

import numpy

class Atom():

    '''

    Experimental class to play with objects

    TODO: slicing & length checks for essential attributes

    Examples
    ________

    >>> import a
    >>> coor0 = numpy.zeros([3,4,3],numpy.float)
    >>> coor1 = numpy.ones([3,4,3],numpy.float)
    >>> b = a.Atom(name=['ARG'],atom=['ATOM'],resid=numpy.array([1,2,3]),coor=coor0)
    >>> c = a.Atom(name=['ARG'],atom=['ATOM'],resid=numpy.array([4,5,6]),coor=coor1)
    >>> b + c
    >>> b.resid()
    array([1, 2, 3, 4, 5, 6])
    >>> b.coor()[-1]
    array([[ 1.,  1.,  1.],
        [ 1.,  1.,  1.],
        [ 1.,  1.,  1.],
        [ 1.,  1.,  1.]])

    '''

    def __init__(self, atom=None, index=None, name=None, resname=None, resid=None, coor=None):
        try:
            if type(atom) is list:
                self.__atom = atom
            else:
                self.__atom = None
        except:
            self.__atom = None

        try:
            if type(index) is numpy.ndarray:
                self.__index = numpy.array(index, numpy.int)
            else:
                self.__index = None
        except:
            self.__index = None

        try:
            if type(name) is list:
                self.__name = name
            else:
                self.__name = None
        except:
            self.__name = None

        try:
            if type(resname) is list:
                self.__resname = resname
            else: 
                self.__resname = None
        except:
            self.__resname = None

        try: 
            if type(resid) is numpy.ndarray:
                self.__resid = numpy.array(resid, numpy.int)
            else:
                self.__resid = None
        except: 
            self.__resid = None

        try:

            if type(coor) is numpy.ndarray:
                self.__coor = numpy.array(coor, numpy.float)
            else:
                self.__coor = None
        except: 
            self.__coor = None

    def __add__(self, other):
        #print self.__dict__
        for key,value in self.__dict__.iteritems():
            #print key,value
            try:
                if type(value) is list:
                    self.__dict__[key].extend(other.__dict__[key])
                elif type(value) is numpy.ndarray:
                    self.__dict__[key] = numpy.concatenate((self.__dict__[key], other.__dict__[key]))
            except:
                pass

    def setAtom(self, atom):
        self.__atom = atom  

    def atom(self):
        return self.__atom 

    def setIndex(self, index):
        self.__index = index  

    def index(self):
        return self.__index 

    def setResname(self, resname):
        self.__resname = resname  

    def resname(self):
        return self.__resname 

    def setName(self, name):
        self.__name = name  

    def name(self):
        return self.__name 

    def setResid(self, resid):
        self.__resid = resid  

    def resid(self):
        return self.__resid 

    def setCoor(self, coor):
        self.__coor = coor  

    def coor(self):
        return self.__coor 

    ## can access direclty via class_instance._Atom__name  
    ## but class_instance.__name does NOT return name

    ## INSIDE the class you can assign directly

    ## OUTSIDE the class you should always use setter / getter
        # although you could assign by class_instance._Atom__name = value