Open dgoldenberg-ias opened 4 months ago
Version info:
Hi, any word on this one?
It's a gap that nobody has picked up. If you're willing to take a stab at it, I don't think there would be any objection.
Hi @wbeckler, could you provide any pointers as to where in opensearch-hadoop this could be plugged in? Might we need to add some code in opensearch itself for this?
I don't personally have any experience adding support for a new field type, however this https://github.com/opensearch-project/opensearch-hadoop/commit/c651e79c620f5d55fb91e908a7ba54b978632502 looks to be the most recent commit adding a new field type so may be a good jumping off point to at least figure out which files are likely to need modifying. Do note however that that commit is from before the fork from elasticsearch so the paths will be a bit different, but the file names should generally be the same, other than elasticsearch -> opensearch and es -> os.
Thanks @Xtansia, the issue is that it's not clear what the type should be. Presumably, knn_vector
? But I don't see it being handled in the stock opensearch or elastic code; wonder if that's done in a custom module or some such. But then, would following c651e79 just work? I don't grok the architecture of this so can't tell how to approach.
It would be knn_vector
yes, the mapping is defined in the k-NN plugin: https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/mapper/KNNVectorFieldMapper.java
Is your feature request related to a problem?
I've attempted to load index data including a vector field into a Spark dataframe and vector field(s) do not get loaded while other fields do.
What solution would you like?
Please add support for loading of/handling of vector fields in OpenSearch so they can be loaded into a dataframe.
What alternatives have you considered?
My workaround is to use the scan/scroll API to fetch the data and then create the dataframe in my code from that data but it would be great to have them supported in opensearch-hadoop 'out of the box'.
Do you have any additional context?
Please take a look at the code samples below.
Scenario 1: The Successful Working Scenario
Scenario 1: A Successful Load of all Fields into a Dataframe
Scenario 1: Successful Result
Scenario 2: Create an Index with a Vector Field
Scenario 2: Populate the Index with Vectors
Result: Number of documents indexed: 100
Scenario 2: Failing to Load the Vector Field
Scenario 2: Resulting in an Error
Stack trace: