zilliztech / spark-milvus

Apache License 2.0
7 stars 3 forks source link

[Bug]: Spark Milvus Connector DataType.ARRAY #15

Open juanandreas opened 4 months ago

juanandreas commented 4 months ago

Is there an existing issue for this?

Describe the bug

When I try to write data into a collection with a predefined schema, my spark job write aborts with:

scala.MatchError: Array (of class io.milvus.grpc.DataType)

When I try to write with spark-milvus connector without predefining a schema:

java.lang.Exception: Unsupported data type array

Expected Behavior

Pymilvus should be able to recognize array data types? Predefining schema should also work? Is this a bug in spark to milvus connector?

Steps/Code To Reproduce behavior

fields = [
  FieldSchema(name="id", is_primary=True, dtype=DataType.VARCHAR, max_length=100),

  FieldSchema(name="countries", dtype=DataType.ARRAY, element_type=DataType.VARCHAR, max_length=100, max_capacity=100),

  # FieldSchema(name="vector_field", dtype=DataType.FLOAT_VECTOR, dim=705),
  FieldSchema(name="vector_field", dtype=DataType.SPARSE_FLOAT_VECTOR),

]
schema = CollectionSchema(
  fields,
  description="collection",
  enable_dynamic_field=True
)

collection = Collection(COLLECTION_NAME, schema)

df.write \
  .mode("append") \
  .option("milvus.host", MILVUS_HOST) \
  .option("milvus.port", MILVUS_PORT) \
  .option("milvus.collection.name", COLLECTION_NAME) \
  .option("milvus.collection.vectorField", "vector_field") \
  .option("milvus.collection.vectorDim", "705") \
  .option("milvus.collection.primaryKeyField", "id") \
  .option("milvus.database.name", "default") \
  .format("milvus") \
  .save()

Environment details

- Hardware/Softward conditions (OS, CPU, GPU, Memory):
- Method of installation (Docker, or from source):
- Milvus version (v0.3.1, or v0.4.0): 2.4.1
- Milvus configuration (Settings you made in `server_config.yaml`):

Anything else?

No response

CauchyLion commented 3 months ago

这个问题你有解决嘛?

wayblink commented 3 months ago

Some new datatype is not supported yet. Hopes someone can take this issue, contribution is welcomed. If no one take this issue, we may have time to fix it in next months

xiaofan-luan commented 3 months ago

array and sparse vector need to be supported.

wayblink commented 2 months ago

Advanced data(including json, array, sparse_vector) support is in progress.

juanandreas commented 2 months ago

Is there an update on this?