marieai / marie-ai

Integrate AI-powered Document Analysis Pipelines
MIT License
57 stars 3 forks source link

PydanticInvalidForJsonSchema: Cannot generate a JsonSchema for core_schema.PlainValidatorFunctionSchema #105

Closed Rithsek99 closed 7 months ago

Rithsek99 commented 7 months ago

Describe the bug when running https://github.com/marieai/marie-ai/blob/81b274e2e52579f662c28a9f9d3331c50c1c6c08/examples/hello-marie/README.md, got the following error.

    pydantic.errors.PydanticInvalidForJsonSchema: Cannot generate a JsonSchema for core_schema.PlainValidatorFunctionSchema ({'type': 'with-info', 'function': <bound method                                    
       AbstractType.validate of <class 'docarray.array.any_array.DocList[LegacyDocument]'>>})                                                                                                                      

       For further information visit https://errors.pydantic.dev/2.5/u/invalid-for-json-schema                                                                                                                     
ERROR              : Flow@53378 An exception occurred:                                                                                                                                          [01/26/24 16:28:23]
ERROR              : Flow@53378 Flow is aborted due to ['ExecutorAA', 'ExecutorXX', 'gateway'] can not be started.                                                                                                 
Traceback (most recent call last):
  File "/home/rngem/environment/marie-ai/bin/marie", line 8, in <module>
    sys.exit(main())
  File "/home/rngem/Desktop/rms/marie-ai/marie_cli/__init__.py", line 145, in main
    getattr(api, args.cli.replace('-', '_'))(args)
  File "/home/rngem/Desktop/rms/marie-ai/marie_cli/api.py", line 160, in flow
    with f:
  File "/home/rngem/Desktop/rms/marie-ai/marie/orchestrate/orchestrator.py", line 14, in __enter__
    return self.start()
  File "/home/rngem/Desktop/rms/marie-ai/marie/orchestrate/flow/builder.py", line 33, in arg_wrapper
    return func(self, *args, **kwargs)
  File "/home/rngem/Desktop/rms/marie-ai/marie/orchestrate/flow/base.py", line 1856, in start
    self._wait_until_all_ready()
  File "/home/rngem/Desktop/rms/marie-ai/marie/orchestrate/flow/base.py", line 2038, in _wait_until_all_ready
    raise RuntimeFailToStart
marie.excepts.RuntimeFailToStart
Rithsek99 commented 7 months ago

update Pydantic resolve the issue.

JoanFM commented 6 months ago

Hey @Rithsek99 , what version of Pydantic resolves the issue? I cannot solve this issue

tteguayco commented 4 months ago

Hey @JoanFM,

It may not help, but this is happening to me with version 2.7.1 of pydantic, which is the latest🤷‍♂️

gregbugaj commented 4 months ago

First, to get it working you can use following versions:

docarray                                     0.40.0
pydantic                                      1.10.0
pydantic_core                              2.0.1

The issue is with the Docarray and LegacyDocument having a validate function and a new behavior from 2.x Pydantic.

I have a patch that will handle this but there are other changes that need to be done in the app before rolling this. To test this you can use following.

import pydantic

major_version = int(pydantic.__version__.split('.')[0])

def patch_pydantic_schema(cls):
    raise NotImplementedError

if major_version >= 2:
    from pydantic.json_schema import GenerateJsonSchema, JsonSchemaValue
    from pydantic_core import PydanticOmit, core_schema

    class PydanticJsonSchema(GenerateJsonSchema):
        def handle_invalid_for_json_schema(
                self, schema: core_schema.CoreSchema, error_info: str
        ) -> JsonSchemaValue:
            if "core_schema.PlainValidatorFunctionSchema" in error_info:
                raise PydanticOmit
            return super().handle_invalid_for_json_schema(schema, error_info)

    def patch_pydantic_schema(cls):
        major_version = int(pydantic.__version__.split('.')[0])
        # Check if the major version is 2 or higher
        if major_version < 2:
            schema = cls.model_json_schema(mode="validation")
        else:
            schema = cls.model_json_schema(
                mode="validation", schema_generator=PydanticJsonSchema
            )
        return schema

patch_pydantic_schema_2x = patch_pydantic_schema

from docarray.documents.legacy import LegacyDocument

from marie.utils.pydantic import patch_pydantic_schema_2x

def test_legacy_schema():
    LegacyDocument.schema = classmethod(patch_pydantic_schema_2x)
    legacy_doc_schema = LegacyDocument.schema()
    print(legacy_doc_schema)

If you run this you should get

{'description': "This Document is the LegacyDocument. It follows the same schema as in DocArray <=0.21.\nIt can be useful to start migrating a codebase from v1 to v2.\n\nNevertheless, the API is not totally compatible with DocArray <=0.21 `Document`.\nIndeed, none of the method associated with `Document` are present. Only the schema\nof the data is similar.\n\n```python\nfrom docarray import DocList\nfrom docarray.documents.legacy import LegacyDocument\nimport numpy as np\n\ndoc = LegacyDocument(text='hello')\ndoc.url = 'http://myimg.png'\ndoc.tensor = np.zeros((3, 224, 224))\ndoc.embedding = np.zeros((100, 1))\n\ndoc.tags['price'] = 10\n\ndoc.chunks = DocList[Document]([Document() for _ in range(10)])\n\ndoc.chunks = DocList[Document]([Document() for _ in range(10)])\n```", 'properties': {'id': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'description': 'The ID of the BaseDoc. This is useful for indexing in vector stores. If not set by user, it will automatically be assigned a random value', 'example': 'ea8d9cee4038f5967f7912e83f617b9f', 'title': 'Id'}, 'tensor': {'anyOf': [{'items': {'type': 'number'}, 'tensor/array shape': 'not specified', 'type': 'array'}, {'type': 'null'}], 'default': None, 'title': 'Tensor'}, 'blob': {'anyOf': [{'format': 'binary', 'type': 'string'}, {'type': 'null'}], 'default': None, 'title': 'Blob'}, 'text': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'default': None, 'title': 'Text'}, 'url': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'default': None, 'title': 'Url'}, 'embedding': {'anyOf': [{'items': {'type': 'number'}, 'tensor/array shape': 'not specified', 'type': 'array'}, {'type': 'null'}], 'default': None, 'title': 'Embedding'}, 'tags': {'default': {}, 'title': 'Tags', 'type': 'object'}, 'scores': {'anyOf': [{'type': 'object'}, {'type': 'null'}], 'default': None, 'title': 'Scores'}}, 'title': 'LegacyDocument', 'type': 'object'}

Effectively I am waiting on upstream jina project to make the updates before I go for it,

JoanFM commented 4 months ago

Hey @gregbugaj,

Could you try to contribute the changes to allow jina to work with Pydantic v2? it would indeed be very helpful to the project.

gregbugaj commented 4 months ago

Yea for sure, currently my plan would be to do some type of schema coercion(quick solution). However real solution is to change _get_field_from_type

Schemas are similar but there are few differences there.

schema-1.x.json schema-2.x.json

JoanFM commented 4 months ago

u can directly open a draft PR where discussions can be more direct

JoanFM commented 3 months ago

Hey @gregbugaj,

Would u be keen on opening such a PR to original jina repo?

gregbugaj commented 3 months ago

Yes I am, haven't got to it yet.