opensearch-project / opensearch-spark

Spark Accelerator framework ; It enables secondary indices to remote data stores.
Apache License 2.0
22 stars 33 forks source link

Fix bug for not able to get sourceTables from metadata #883

Closed seankao-az closed 1 week ago

seankao-az commented 1 week ago

Description

Using custom implementation for FlintIndexMetadataService exposes a bug where we incorrectly get empty sourceTables from FlintMetadata. Depending on how the FlintMetadata is generated, the metadata.properties.get("sourceTables") could either be of type java array or scala array. This PR fixes the bug, now considering both cases.

Also added some logs for troubleshooting in production environment.

Related Issues

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

seankao-az commented 1 week ago

Is it possible to enforce a single and language-independent format in metadata, e.g. JSON array?

Not quite sure.. Since when reading from OpenSearch index mappings, we use jackson library to deserialize into FlintMetadata, and jackson library will use java.util.ArrayList. I'll try using this type instead.

Or do you mean using json string? Like

{"sourceTables": "[\"a.b.c\"], [\"d.e.f\"]"}
dai-chen commented 1 week ago

Is it possible to enforce a single and language-independent format in metadata, e.g. JSON array?

Not quite sure.. Since when reading from OpenSearch index mappings, we use jackson library to deserialize into FlintMetadata, and jackson library will use java.util.ArrayList. I'll try using this type instead.

Or do you mean using json string? Like

{"sourceTables": "[\"a.b.c\"], [\"d.e.f\"]"}

Yeah, just trying to understand where this problem comes from? Is Scala or Java array stored in metadata?

seankao-az commented 1 week ago

mainly comes from this line of scala code. I was putting scala Array[String] here. https://github.com/opensearch-project/opensearch-spark/pull/883/files#diff-1e6cbb4cde65f1142862f3b539a438d534fd70a229499efd671ef9aae67ff29fR75

Now I've changed it to use ArrayList. so for both cases (mv.metadata() and deserialized metadata from index mapping) will have ArrayList. Do you think this is better? @dai-chen