Closed HubLot closed 8 years ago
Hi @HubLot. You're on fire today! I propose visualization.py goes into the analysis directory since map or neq are kinds of analysis. One way to make things easier would be to expose to the main API all functions inside core and format.
Yeah @HubLot ! Great !
I do not know how things should be on the file side, but on the API side I would avoid having to many levels. I would suggest to remove the penultimate level of your hierarchy, then pbxplore.analysis.neq.compute_neq
would become pbxplore.analysis.compute_neq
. It can be done with your file organization by importing the content of pbxplore.analysis.neq
in pbxplore.analysis.__init__
.
I would prefer to have all function related to file (excepted the visualization stuff) in pbxplore.formats
that may be rename pbxplore.fileio
or pbxplore.io
. This include functions such as write_neq
and write_phipsi
.
I suggest to rename pbxplore.core.PB
into pbxplore.data
, to move compute_score_by_position
, and substitution_score
in pbxplore.analysis
, and to move write_flat
in pbxplore.formats
.
pbxplore.formats.PDB
and pbxplore.formats.structure
could be merged in pbxplore.structure
. I do not know what to do with get_dihedral
, though.
count.py
could go in analysis
, therefore core
could become assign
.
It may look like a lot of change, but then the package is organized the same way I see PB used: from a structure or a trajectory (pbxplore.structure
), PB sequences are assigned (pbxplore.assign
), then analyzed (pbxplore.analysis
); the output of the analysis is written into a file (pbxplore.formats
) or plotted in some way (pbxplore.vizualise
). If somebody wants to use an other PDB parser, then it will not use pbxplore.structure
. If the PB sequences are already assigned, then pbxplore.structure
and pbxplore.assign
are not useful. pbxplore.data
limits the need for imports across the module, it also allows a user to write new applications of the PB (like PB prediction of structure building) without touching the rest of pbxplore.
Finally, this really needs a shortcut:
chains = pbxplore.chains_from_files(pdb_name_lst)
all_comments = []
all_sequences = []
all_dihedrals = []
for comment, chain in chains:
dihedrals = chain.get_phi_psi_angles()
sequence = pbxplore.assign(dihedrals)
all_comments.append(comment)
all_dihedrals.append(dihedrals)
all_sequences.append(sequence)
Thanks guys for your inputs.
I do not know how things should be on the file side, but on the API side I would avoid having to many >levels. I would suggest to remove the penultimate level of your hierarchy, then >pbxplore.analysis.neq.compute_neq would become pbxplore.analysis.compute_neq. It can be done >with your file organization by importing the content of pbxplore.analysis.neq in >pbxplore.analysis.init.
It's already done in this PR ;)
I agree with the view of @jbarnoud about the changes with minor revisions (ahah). I think some "core" functions need to be directly accessible from the main module (i.e chains_from_files
, chains_from_trajectory
and assign
). Then the user will load analysis
and/or io
to perform what he wants.
I agree also with @pierrepo about visualization. I think I can move it to analysis (to reduce the number of sub-modules to load) and access the functions like this pbxplore.analysis.plot_neq
OK. But try to be KISS.
I made changes following our discussion. I also updated the scripts (PBassign.py
, etc) to give you an idea.
analysis
(pbxplore.analysis.compute_neq
for example)PBassign.py
, PBcount.py
, PBclust.py
and PBstat.py
chains_from_files
, chains_from_trajectory
, assign
and PB
module)I rename the folder formats by structure but I kept the structure and the files in it. In fact, it's mainly internal parser and the user should only know about chains_from_files
and chains_from_trajectory
.
I followed the suggestions for PB.py
but I kept its name because this one is more recognizable (PB = Protein Blocks) than data
(data is a good name for variable, not so much for modules).
Overall, the changes made the architecture and the use of the module clearer and easier.
But, if you look into the scripts, I'm not satisfied with the import needed. Right now, we need 3 imports and the length to the function calls are long. Maybe @pierrepo was right by exposing all functions to the main module :)
Well well well. Can we have the week-end to think about it?
I looked at PBassign.py and liked it much. I do not see a problem with the 3 lines of import. It is fairly common, and I find it neater: you know right from the imports what the script is about. Also, having things split that way should make things easier with optional dependencies.
Are the function call really much longer? pbxplore.io.write_fasta
is 3 characters longer than pbxplore.write_fasta
The names that are a bit long are those of the constants in pbxplore.structure
. They are not used that often, if they are they can be imported as from pbxplore.structure import THE_LONG_CONSTANT_NAME
.
I looked at PBassign.py and liked it much. I do not see a problem with the 3 lines of import. It is fairly > common, and I find it neater: you know right from the imports what the script is about. Also, having > things split that way should make things easier with optional dependencies.
I could agree for the "analysis" sub-modules but not so much for the others (structure, io). 3 imports is huge imho. Numpy, matplotlib, mdtraj have mainly one import (of course, they are counter examples :) ) and they are pretty clear.
Are the function call really much longer?
pbxplore.io.write_fasta
is 3 characters longer than >pbxplore.write_fasta
Indeed but with one main import, we could do:
import pbxplore as pbx
pbx.write_fasta()
@HubLot If you want to reduce the number of imports, you can add some imports in pbxplore.__init__
. If you have import .io
in the __init__
you should be able to access pbxplore.io
from the import pbxplore
.
I agree that the most common functions should have shortcuts, but not all functions or it will quickly become a mess.
@HubLot If you want to reduce the number of imports, you can add some imports in
pbxplore.__init__
. If you haveimport .io
in the__init__
you should be able to accesspbxplore.io
from theimport pbxplore
.
Indeed, this is already the case for the sub modules PB
and we could add io
and structure
too. Then, the shortcut import pbxplore as pbx
will work fine ( i.e pbx.io.write_fasta()
)
I made few changes based on the discussion. As suggested by @jbarnoud, the submodules are exposed directly into the main pbxplore module through the pbxplore/__init__.py
. I also exposed the submodule analysis
to keep consistency between submodules.
Now, only one import is required (import pbxplore as pbx
) but the hierarchy/level of the submodules are kept:
import pbxplore as pbx
pbx.assign()
pbx.chain_from_files()
pbx.io.write_fasta()
pbx.analysis.compute_neq()
I have also reflects those changes into the tests to pass travis checks
OK. I merge the beast.
Like said previously, with the most of the refactoring done, the
PBlib.py
file has grown very large. I propose a separation of it into sub-modules based on #25. Don't merge it yet because the scripts have not been modified.Here, a suggestion of architecture. The spitting was not so obvious so don't hesitate to bring your opinions to improve it. (bold state for folders and italic for functions/classes/constants)
PBassign.py
,PBcount.py
,PBclust.py
andPBstat.py
The idea was to separate:
loader.py
should be directly accessible from pbxplore (pbxplore.chains_from_files()
). Maybe also it lakes of proper API functions.__init__.py
for the exposed functions.From this architecture, the main question/issue I struggle is how to expose the functions inside the submodules
core
andformats
to the API user and I need your opinions. That's why I didn't change the scripts (hence the fail on tests). I think foranalysis
andvisualization
, it's pretty clear: the user will import the submodule to use the functions. For example, functionschains_from_files
,chains_from_trajectory
,assign
should be directly accessible from pbxplore main module (pbxplore.my_function
) but what about the other ones ? Should all be exposed to the main module ? If not, which ones ?Again, maybe it's not clear because either I split those functions in wrong way or either it lakes of "global" API functions. Also, the number of read/write functions on different objects (count ,fasta, phipsi, flat) does not ease it and for this case, it's already API functions.
I'm all ears.