Closed pagpires closed 2 years ago
Also a quick note: when dispatching for SeqLike, I guess we can use SeqLikeType.__args__
as the type instead of hardcoding a list of potential types, otherwise it's hard to keep them consistent (since I saw there's a plan to include torch.tensor)
@pagpires thank you for chiming in! Yes, it looks like we have something wrong in the docs - by design, SeqLike()
requires a sequence of letters and a seq_type
to be specified. (I'm looking at the API reference here.) However, it appears that the examples provided here don't show this.
In a recent release, we decided to make 'factory functions', aaSeqLike()
and ntSeqLike()
, which return proper SeqLike objects but don't require specification of seq_type
as an argument. Perhaps we could use some help making those two factory functions a bit more prominent. What are your thoughts on how this could be done better?
Also a quick note: when dispatching for SeqLike, I guess we can use SeqLikeType.args as the type instead of hardcoding a list of potential types, otherwise it's hard to keep them consistent (since I saw there's a plan to include torch.tensor)
I think this is a great idea. I'm going to make a new issue for this one as it feels like a unit of work that can be made as a PR.
UPDATE: See issue #42 for further discussion on dispatching.
In a recent release, we decided to make 'factory functions', aaSeqLike() and ntSeqLike()
That makes sense. Then I think it's just that SeqLike
's docstring needs update. It might also help to explicitly mention this api design in the development page in the doc, like "nt/aaSeqlike
(user interface for instantiation) - SeqLike
(data representation, discouraged to be used for instantiation) - _construct_seqlike
(logic under the hood)". Just speaking for myself though.
Happy to handle this one. It's a small enough change and I know where to make the changes. :smile:.
I was playing around with the library (it was really nice and great job!) and hit this bug: instantiation like
SeqLike('ATCGATC')
orSeqLike('ATCGATC', None)
will fail. From the doc: 1)seq_like
is supposedly optional but is required in the implementation (code) and 2) logic regardingseq_like==None
only occurs whensequence
is aSeqLike
(code) as opposed to a native type and thus seems not very useful? (There's adef determine__type()
(code) that I think is intended for the job but it's not used.)In addition, some general comments regarding dispatch, feel free to ignore them if they are out of context:
def _construct_seqlike
share the same signature foralphabet
andcodon_map
, anddef determine__type_and_alphabet
share same signature forseq_type
andsequence
. Thus it seems clearer to extract the only signature that varies? E.g. for the latter case we only dispatch functions based onalphabet
(or even directly useif alphabet==None: alphabet=determine_alphabet(_type, sequence)
rather than dispatching). Current setup for these two functions seems to create duplicate logics and is error prone (types forsequence
here should be the same?)Again, happy to file a PR for anything specific above.