pierrepo / PBxplore

A suite of tools to explore protein structures with Protein Blocks :snake:
https://pbxplore.readthedocs.org/en/latest/
MIT License
28 stars 17 forks source link

[WIP] Modularization of PBstat.py #77

Closed HubLot closed 8 years ago

HubLot commented 8 years ago

This pull request aims to modularize the code of PBstat.py and create an API (as proposed in #25) to use it as a library. This is a Work In Progress so don't merge it until the work is done.

The goal is to create visualization functions (neq, map and weblogo) that use as an input the output of count_matrix in PBlib.py so an user can :

   # An example
    import PBlib as PB
    chains = PDB.chains_from_files(["demo2_tmp/psi_md_traj_1.pdb"])
    seqs = []
    for comment, chain in chains:
        dihedrals = chain.get_phi_psi_angles()
        sequence = PB.assign(dihedrals)
        seqs.append(sequence)
    pb_count = PB.count_matrix(seqs)
    neq = PB.compute_neq(pb_count)
    PB.plot_neq(neq, "psi_md_neq.png")

What do you think ?

So far, I have only split the module into functions to avoid floating code.

ping @jbarnoud

pierrepo commented 8 years ago

Seems good to me @HubLot !

HubLot commented 8 years ago

Quite a huge commit here. The details are in the commit message but here a few more explanations:

Now, all the visualization functions (neq, map and weblogo) have been moved to PBlib.pyand use as an input the output of count_matrix in PBlib.py so an user can (as state previously):

   # An example
    import PBlib as PB
    chains = PDB.chains_from_files(["demo2_tmp/psi_md_traj_1.pdb"])
    seqs = []
    for comment, chain in chains:
        dihedrals = chain.get_phi_psi_angles()
        sequence = PB.assign(dihedrals)
        seqs.append(sequence)
    pb_count = PB.count_matrix(seqs)
    neq = PB.compute_neq(pb_count)
   # Modification of the parameter's order
    PB.plot_neq("psi_md_neq.png", neq)
   # For sub plots
    PB.plot_neq("psi_md_neq_1-10.png",neq, residue_min=1, residue_max=10)

I have also added to functions to the library :

compute_freq_matrix which compute a frequency matrix needed for neq and map from an occurrence matrix. It's called from the visualization functions to be transparent for the API user.

The main issue I encountered is how to deal with the residue-min and residue-max options from the PBstat.py CLI and how to slice correctly the values for the visualization functions. Because from an API user, the PB.count_matrix() doesn't deal with residues indexes. The new function _slice_matrix(matrix, residue_min=1, residue_max=None) aims to resolve this problem by slicing a given matrix with the boundaries given in parameters. It ensures the boundaries is good and returned the sliced matrix. This function is only called by the visualizations one (write_neq, plot_neq, plot_map) because it's not worth it to handle sliced matrices (with correct offset and max boundaries) for the whole API.

Now, the PBstat.py is shorter and doesn't differ from the API. However, the creation of the weblogo image is still done with the old way (that's why there is a check_residue_range function inside the module. The reason is that I would like to use the weblogo API instead of the binary to generate images (see #78)

jbarnoud commented 8 years ago

:+1:

At one point we should split PBlib into several files. I do not like to require matplotlib to do a PB assignation. This is out of the scope of this PR, though.

HubLot commented 8 years ago

The last commit solve the issue #78. Now, the weblogo images are generated from the API of weblogo. The call to the function generate_weblogo is no different from the API. This function and read_occurence_matrix have been moved to PBlib.py

With this last commit, the PR can now be reviewed and merged if it's ok.

I agree with @jbarnoud, now that almost all functions are in PBlib.py, we need to split it in several files as proposed in #25.

pierrepo commented 8 years ago

Thanks guys! Indeed PBlib is becoming bigger and bigger...