Closed Rithsek99 closed 7 months ago
update Pydantic resolve the issue.
Hey @Rithsek99 , what version of Pydantic resolves the issue? I cannot solve this issue
Hey @JoanFM,
It may not help, but this is happening to me with version 2.7.1
of pydantic, which is the latest🤷♂️
First, to get it working you can use following versions:
docarray 0.40.0
pydantic 1.10.0
pydantic_core 2.0.1
The issue is with the Docarray and LegacyDocument having a validate
function and a new behavior from 2.x Pydantic.
I have a patch that will handle this but there are other changes that need to be done in the app before rolling this. To test this you can use following.
import pydantic
major_version = int(pydantic.__version__.split('.')[0])
def patch_pydantic_schema(cls):
raise NotImplementedError
if major_version >= 2:
from pydantic.json_schema import GenerateJsonSchema, JsonSchemaValue
from pydantic_core import PydanticOmit, core_schema
class PydanticJsonSchema(GenerateJsonSchema):
def handle_invalid_for_json_schema(
self, schema: core_schema.CoreSchema, error_info: str
) -> JsonSchemaValue:
if "core_schema.PlainValidatorFunctionSchema" in error_info:
raise PydanticOmit
return super().handle_invalid_for_json_schema(schema, error_info)
def patch_pydantic_schema(cls):
major_version = int(pydantic.__version__.split('.')[0])
# Check if the major version is 2 or higher
if major_version < 2:
schema = cls.model_json_schema(mode="validation")
else:
schema = cls.model_json_schema(
mode="validation", schema_generator=PydanticJsonSchema
)
return schema
patch_pydantic_schema_2x = patch_pydantic_schema
from docarray.documents.legacy import LegacyDocument
from marie.utils.pydantic import patch_pydantic_schema_2x
def test_legacy_schema():
LegacyDocument.schema = classmethod(patch_pydantic_schema_2x)
legacy_doc_schema = LegacyDocument.schema()
print(legacy_doc_schema)
If you run this you should get
{'description': "This Document is the LegacyDocument. It follows the same schema as in DocArray <=0.21.\nIt can be useful to start migrating a codebase from v1 to v2.\n\nNevertheless, the API is not totally compatible with DocArray <=0.21 `Document`.\nIndeed, none of the method associated with `Document` are present. Only the schema\nof the data is similar.\n\n```python\nfrom docarray import DocList\nfrom docarray.documents.legacy import LegacyDocument\nimport numpy as np\n\ndoc = LegacyDocument(text='hello')\ndoc.url = 'http://myimg.png'\ndoc.tensor = np.zeros((3, 224, 224))\ndoc.embedding = np.zeros((100, 1))\n\ndoc.tags['price'] = 10\n\ndoc.chunks = DocList[Document]([Document() for _ in range(10)])\n\ndoc.chunks = DocList[Document]([Document() for _ in range(10)])\n```", 'properties': {'id': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'description': 'The ID of the BaseDoc. This is useful for indexing in vector stores. If not set by user, it will automatically be assigned a random value', 'example': 'ea8d9cee4038f5967f7912e83f617b9f', 'title': 'Id'}, 'tensor': {'anyOf': [{'items': {'type': 'number'}, 'tensor/array shape': 'not specified', 'type': 'array'}, {'type': 'null'}], 'default': None, 'title': 'Tensor'}, 'blob': {'anyOf': [{'format': 'binary', 'type': 'string'}, {'type': 'null'}], 'default': None, 'title': 'Blob'}, 'text': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'default': None, 'title': 'Text'}, 'url': {'anyOf': [{'type': 'string'}, {'type': 'null'}], 'default': None, 'title': 'Url'}, 'embedding': {'anyOf': [{'items': {'type': 'number'}, 'tensor/array shape': 'not specified', 'type': 'array'}, {'type': 'null'}], 'default': None, 'title': 'Embedding'}, 'tags': {'default': {}, 'title': 'Tags', 'type': 'object'}, 'scores': {'anyOf': [{'type': 'object'}, {'type': 'null'}], 'default': None, 'title': 'Scores'}}, 'title': 'LegacyDocument', 'type': 'object'}
Effectively I am waiting on upstream jina
project to make the updates before I go for it,
Hey @gregbugaj,
Could you try to contribute the changes to allow jina to work with Pydantic v2? it would indeed be very helpful to the project.
Yea for sure, currently my plan would be to do some type of schema coercion(quick solution). However real solution is to change _get_field_from_type
Schemas are similar but there are few differences there.
u can directly open a draft PR where discussions can be more direct
Hey @gregbugaj,
Would u be keen on opening such a PR to original jina repo?
Yes I am, haven't got to it yet.
Describe the bug when running https://github.com/marieai/marie-ai/blob/81b274e2e52579f662c28a9f9d3331c50c1c6c08/examples/hello-marie/README.md, got the following error.