ENH: adds CFF translator from catalog base class; sets entrypoint

jsheunis commented 1 year ago

To be used for testing of https://github.com/datalad/datalad-catalog/pull/246

mslw commented 1 year ago

Thanks, having an example translator that goes along the catalog PR is very helpful!

It looks to me that, compared to the "old" generation of translators, the "new" class needs to:

allow instantiating without metadata
provide the three get_supported_()* methods (these would be used by match() inherited from TranslatorBase)
provide translate() method which takes metadata as an input

I looked at the proposed Translate implementation in the catalog, and wonder about two things. I'm putting them here just to share my thoughts:

Would it be better to expect the translator class to implement the match() method directly, instead of having the methods that report supported name, version etc? (BTW, we should probably also consider passing extractor ID, translators may use this info or not). This way, the TranslatorBase class wouldn't need impose any particular logic. So if the extractor uses semver, PEP440, or any other versioning scheme, the translator could rely on available comparison methods from e.g. packaging or looseversion and implement any matching logic that's needed. Of course, already there is nothing that prevents the translator class from overriding match().

Would it be better to have the match() method be a @classmethod, so the translator wouldn't need to be instantiated until we know that it matches? This way metadata could still go to __init__, although maybe it isn't a pattern worth preserving.

Expressed in code rather than prose, TranslatorBase would have something like:

@classmethod
@abstractmethod
def match(self, source_name: str, source_version: str, source_uuid: str|None = None) -> bool:

jsheunis commented 1 year ago

Thanks for this feedback.

Would it be better to expect the translator class to implement the match() method directly, instead of having the methods that report supported name, version etc? (BTW, we should probably also consider passing extractor ID, translators may use this info or not). This way, the TranslatorBase class wouldn't need impose any particular logic. So if the extractor uses semver, PEP440, or any other versioning scheme, the translator could rely on available comparison methods from e.g. packaging or looseversion and implement any matching logic that's needed. Of course, already there is nothing that prevents the translator class from overriding match().

Perhaps it's useful to instruct developers that they can (or are encouraged to?) provide a custom match() implementation that overrides the method from the base class, but we still keep the current implementation on the base class so that there's something to fall back on?

Would it be better to have the match() method be a @classmethod, so the translator wouldn't need to be instantiated until we know that it matches? This way metadata could still go to init, although maybe it isn't a pattern worth preserving.

Good point! I agree.

mslw commented 1 year ago

I updated the PR with changes to the catalog's TranslatorBase that were made since. I will either merge it now, or branch off of that, and work on updating other extractors. Thanks again for providing the reference implementation!

mslw / datalad-wackyextra

ENH: adds CFF translator from catalog base class; sets entrypoint #1