Closed ericmjl closed 2 years ago
This is a great idea. Implementing the ABC is a good first step. There are a number of abstract classes in https://docs.python.org/3/library/collections.abc.html that we might be able to inherit from, or use as inspiration.
On the SeqLike side, I think all we would need to do is implement:
@property
def sequence(self):
return self._seqrecord.seq
The big difference would be that in the base class, .sequence would be what was passed into the constructor along with an alphabet.
Methods to pull out/implement in abstract base class might include:
to_str()
- probably now ''.join(self.sequence)
, because it could very well be a listto_index()
- almost as isto_onehot()
- almost as isapply()
- almost as is. relies on __deepcopy__()
count()
- as isfind()
- this could be implemented with a while loop and self.sequence
__len__()
- len(self.sequence)
__contains__(x)
- x in self.sequence
iter()
- iter(self.sequence)
Note that all these methods would depend on self.sequence
and potentially self.alphabet
.
One open question is what "form" to take as input in the base class (essentially, sequence, index, or one-hot). I think we need to support all three, which means dispatching in the constructor to build .sequence
if the input is in index or one-hot form.
This constructor logic would not be used in SeqLike (i.e., no super().__init__()
call), and in fact, all we'd use in the inheritance are the interface of having .sequence
and .alphabet
and the methods that use them.
It'd be nice to support arbitrary alphabets for sequences that are not necessarily string-type. For e.g. we may want to do sequence of codons, or sequence of other entities.
Doing so would allow us to access the
to_onehot()
orto_index()
capabilities of SeqLike objects without necessarily being bound to BioPython SeqRecord/Seq objects.Potential challenges:
.sequence
and.alphabet
, which the encoder functions expect (?)..to_*()
functions.A good concrete first step here is to create an Abstract Base Class for discussion purposes.