This pull request divides PBcount into functions. It also move the generic functions to PBlib. Thus, this pull request improves PBxplore modularity, and makes the calculation of an occurence matrix available from the library. This improved modularity is a step toward the #25 proposal.
Note that this pull request expect the pull request #48 to be merged, even if there is not a strict dependency. Indeed, #48 defines regression tests for PBcount, and these tests should pass with the changes introduced here.
Four new functions appear in PBlib:
read_several_fasta reads several fasta files, the path to which are given in parameters;
assert_same_size raises a SizeError exception when all the sequences in a list are not of the same length;
count_matrix computes an occurence matrix from a list of sequences;
write_count_matrix writes an occurence matrix in a file.
In addition to these functions, the pull request introduces two new exceptions:
SizeError is raised when a sequence does not have the expected length;
InvalidBlockError is raised when a function encounter an unexpected block.
It is now possible to read a PDB file, to assign the corresponding PB sequence, and to calculate an occurence matrix using the PBxplore python library:
import PDBlib as PDB
import PBlib as PB
pb_seq = []
pdb = PDB.PDB('demo1/2LFU.pdb')
for chain in pdb.get_chains():
dihedrals = chain.get_phi_psi_angles()
pb_seq.append(PB.assign(dihedrals,
PB.REFERENCES))
pb_count = PB.count_matrix(pb_seq)
In the exemple above, pb_seq is the list of the PB sequence for each model in 2LFU.pdb, and pb_count is the corresponding occurence matrix. In the occurence matrix, each row corresponds to a position, and each column corresponds to a PB (in alphabetical order).
This pull request divides
PBcount
into functions. It also move the generic functions to PBlib. Thus, this pull request improves PBxplore modularity, and makes the calculation of an occurence matrix available from the library. This improved modularity is a step toward the #25 proposal.Note that this pull request expect the pull request #48 to be merged, even if there is not a strict dependency. Indeed, #48 defines regression tests for PBcount, and these tests should pass with the changes introduced here.
Four new functions appear in PBlib:
read_several_fasta
reads several fasta files, the path to which are given in parameters;assert_same_size
raises aSizeError
exception when all the sequences in a list are not of the same length;count_matrix
computes an occurence matrix from a list of sequences;write_count_matrix
writes an occurence matrix in a file.In addition to these functions, the pull request introduces two new exceptions:
SizeError
is raised when a sequence does not have the expected length;InvalidBlockError
is raised when a function encounter an unexpected block.It is now possible to read a PDB file, to assign the corresponding PB sequence, and to calculate an occurence matrix using the PBxplore python library:
In the exemple above,
pb_seq
is the list of the PB sequence for each model in2LFU.pdb
, andpb_count
is the corresponding occurence matrix. In the occurence matrix, each row corresponds to a position, and each column corresponds to a PB (in alphabetical order).