Closed caetera closed 10 months ago
Thank you, this looks great. I made a couple of minor changes, if you don't mind.
I also looked into adding this via a metaclass so that this is injected in every FlavoredMixin
subclass automatically, but do we have a universal way to add a proper metaclass? It looks like the add_subclass
add_metaclass
routine actually modifies the class itself and does not affect its descendants, unless I'm missing something, which is totally possible since I don't deal with metaclasses often.
@levitsky I'm not sure what the add_subclass
name you're referring to is. By "universal way to add a proper metaclass", do you mean syntactically to deal with the differences between Py2 and Py3? There's six.add_metaclass
, which produces functionally the same thing.
@mobiusklein thanks for chiming in. I mean the function auxiliary.utils.add_metaclass
, and yes, I was referring to a way to make it work on Py2 and Py3. We've been avoiding adding six
as a dependency and I thought utils.add_subclass
was functionally equivalent?
Yes, pyteomics.auxiliary.utils.add_metaclass
is functionally equivalent to six.add_metaclass
. I forgot I added that (probably when I wrote the mztab
module). It does affect subclasses, because the object bound to the name of the class you decorate isn't the type
you wrote out syntactically. add_metaclass
creates a copy of that type object's internal data structures, and then creates a new type object that is an instance of the metaclass
instead of the built-in type
.
So to accomplish what you're trying to do with _add_raw_field
with a metaclass, you'd write something like this:
class _FastaParserFlavorMeta(type):
def __new__(mcs, name, bases, namespace):
if "parse" in namespace:
namespace["parse"] = _add_raw_field(namespace["parse"])
return super().__new__(mcs, name, bases, namespace)
@add_metaclass(_FastaParserFlavorMeta)
class FlavorMixin(...):
...
Here, namespace
is a dict
of all the names bound to the class definition, e.g. methods, class attributes, annotations, docstring, and a few other things. If the class defines a parse
method, it'll be in the namespace
. Otherwise, that'll be resolved by the usual MRO from bases
. If every class you create that defines a parse
method is an instance of _FastaParserFlavorMeta
, they'll always have this wrapped in _add_raw_field
. If you were to use a mixin type that is not a _FastaParserFlavorMeta
to inject your parse method won't be wrapped unless you explicitly write a forwarding method like so:
class ExternalParseMixin:
def parse(self, *args, **kwargs):
...
# Inherit `_FastaParserFlavorMeta` from `FlavorMixin`,
class NewFlavorMixin(FlavorMixin, ExternalParseMixin):
def parse(self, *args, **kwargs):
return ExternalParseMixin.parse(self, *args, **kwargs)
There are a few other things we did to the FASTA parsers over the years we might be able to make less repetitive with a metaclass, like _add_init
, but I'd need to re-read it to remember what problem was.
@mobiusklein Thank you for clearing this up. What I was apparently missing in regard to the effect of metaclasses on the subclasses is that it's only there if the metaclass returns an instance of a type other than type
. I didn't realize this distinction was important, seeing the examples of metaclasses that just used type()
to return an altered class definition.
For instance, this metaclass:
class NotGreatMeta(type):
def __new__(cls, name, bases, attrs):
print('called for', name)
return type(name, bases, attrs)
does not print anything for a subclass of a class, but this does:
class SeriousMeta(type):
def __new__(cls, name, bases, attrs):
print('called for', name)
return super().__new__(cls, name, bases, attrs)
while both work for the very class to which they are applied. The reason is, apparently, that the fist one produces instances of type
and the second one produces instances of SeriousMeta
.
Now, trying to apply your example, I am running into another issue:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[3], line 1
----> 1 from pyteomics import fasta
File ~/py/pyteomics/pyteomics/fasta.py:516
511 _intify(info, ('n',))
512 return info
515 @_add_init
--> 516 class UniRef(UniRefMixin, FASTA):
517 pass
520 @_add_init
521 class IndexedUniRef(UniRefMixin, TwoLayerIndexedFASTA):
TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
This happens with all the concrete classes that inherit both from FlavoredMixin
and from FASTA
or IndexedFASTA
, as both of the latter are subclasses of FileReader
, which in turn has abc.ABCMeta
for metaclass. Apparently, since _FastaParserFlavorMeta
is not a subclass of abc.ABCMeta
, we have a metaclass conflict. As far as I can tell, ABCMeta
was added to FileReader
in order to register MzMLb
as a subclass. Would you have any suggestions on resolving this? I imagine we could make _FastaParserFlavorMeta
inherit from abc.ABCMeta
, unless it has any undesirable consequences?
It's safe to derive _FastaParserFlavorMeta
from ABCMeta
, with the caveat that if you decorate parser
with abstractmethod
, I think the current trick of patching the namespace
will cause trouble, though this can be countered by only running that code when the type is not the base FlavorMixin
.
Since that's not currently the design, that shouldn't be an issue, although semantically parser
is treated like an abstract method, we just don't enforce it, nor do we need to make things more complicated.
Thank you all! Looks ready to me.
This PR adds an entry to FASTA file parsers that contains raw (i.e. non-parsed header). The latter is used by the
write
to write parsed protein entries.Useful, for the fast filtering of FASTA files, for example: