toumorokoshi / jsonschema-extractor

Extract jsonschema from various Python objects
MIT License
26 stars 9 forks source link

Registered hooks in TypingExtractor should overwrite builtins #9

Closed jpsnyder closed 2 years ago

jpsnyder commented 2 years ago

In my attempt to register my own hook for enums to be represented as an "enum" type using the names instead of an integer I ran into an issue trying to create my own registered hook.

If you attempt to register a hook for a type that is already handled builtin, it will be ignored because the builtin int handler is tried first.

from enum import IntEnum
import jsonschema_extractor

typing_extractor = jsonschema_extractor.TypingExtractor()
typing_extractor.register(IntEnum, lambda extractor, typ: {
    "enum": [c.name for c in typ]
})
extractor = jsonschema_extractor.SchemaExtractorSet([typing_extractor])

class Test(IntEnum):
    A = 1
    B = 2
    C = 3

print(extractor.extract(Test))

outputs the following, ignoring our hook:

{'type': 'integer'}

This can be fixed by forcing the hook to be in the front.

from enum import IntEnum
import jsonschema_extractor

typing_extractor = jsonschema_extractor.TypingExtractor()
typing_extractor._extractor_list.insert(0, (IntEnum, lambda extractor, typ: {
    "enum": [c.name for c in typ]
}))
extractor = jsonschema_extractor.SchemaExtractorSet([typing_extractor])

class Test(IntEnum):
    A = 1
    B = 2
    C = 3

print(extractor.extract(Test))

outputs:

{'enum': ['A', 'B', 'C']}

But this is obviously a hacky solution. Should we have registered hooks be inserted in front of the builtins, so we can overwrite types as desired?

toumorokoshi commented 2 years ago

thanks for the issue report!

perhaps inserting the hooks in front of built-ins would solve most behavior, but I think the crux of the issue is that currently TypingExtractor matches the first extractor in the list of which the object is a subclass. So in this case, it's because IntEnum is a subclass of int.

I think it's better to try to go by finding the more specific handler (perhaps the one with the closest parent in the MRO) instead.

The interesting thing about IntEnum is that classes that inherit from it are actually subclasses of EnumMeta rather than IntEnum: the latter seems to be some sort or virtual class.

type doesn't given any discerning information, but we can read the mro:

>>> Test.__mro__
(<enum 'Test'>, <enum 'IntEnum'>, <class 'int'>, <enum 'Enum'>, <class 'object'>)

so I think this bug can be resolved by instead of using issubclass, hand-coding a "distance from root" calculator on the MRO and taking the first handler that matches the exact class in the MRO.

I'll work on a fix.

jpsnyder commented 2 years ago

That looks like a better idea. Thanks!

toumorokoshi commented 2 years ago

fixed and shipped in 1.0. Thanks!