spring-projects / spring-data-elasticsearch

Provide support to increase developer productivity in Java when using Elasticsearch. Uses familiar Spring concepts such as a template classes for core API usage and lightweight repository style data access.
https://spring.io/projects/spring-data-elasticsearch/
Apache License 2.0
2.9k stars 1.33k forks source link

Support for DynamicTemplates. #2969

Open youssef3wi opened 4 weeks ago

youssef3wi commented 4 weeks ago

Closes #2774

I'm uncertain if this will be merged, so the best approach is to use converters.

sothawo commented 3 weeks ago

Thanks for that PR, I finally had some time to have a look.

Restrictions I see in this current draft:

Besides that this would need some documentation not only in the javadoc, but in the reference documentation as well.

youssef3wi commented 3 weeks ago

Honestly I believe it's not good to support this or even improving this draft

sothawo commented 3 weeks ago

Another thought that I just had: the dynamic templates definition must not necessarily come from a @DynamicTemplates annotation on the Java entity. It could have been written to the index mapping from outside ot could be contained in a json that is referred to with the @Mappings annotation. So it would be necessary to retrieve the current mappping from the cluster to find out the relevant settings here. This would have to be done once and not on every write/read and stored somehow on the entity - doing stuff like this in the ractive world easily gets nasty..

Seems quite some effort to add this; if you don't want to go on with this, just drop it.

youssef3wi commented 3 weeks ago

@sothawo I'm not convinced by this draft at the moment.

First, the metadata from the cluster is not in the correct location. I just placed it there for testing purposes, but I haven't found a secure place to retrieve the metadata from the cluster.

We should focus on the types and the conversion after that.

youssef3wi commented 1 week ago

@sothawo

sothawo commented 1 week ago

We cannot guarantee to have the information/mapping of an index when an entity is written. The index name that the document is written to might not exist yet, it might get created by Elasticsearch when the document comes in and the mapping can then come from an index template, but there would be no way for Spring Data Elasticsearch to convert to entity to a document using that mapping.

Neither can we guarantee to have this information when reading documents from Elasticsearch. We can try to get the information for all indices (assuming we have the access rights to do so), but there can be indices created from outside the application in between the time this information is retrieved and a search result needs converting. Users read indices with Spring Data Elasticsearch that is created for example by log ingestion frameworks. Even if we can notice in the process of mapping a document to an entity that this document comes from an index where we have no information about, it's not possible to get the information at that time. This would mean a call to the cluster from within the mapping process. At this place we have no access to any (Reactive)ElasticsearchOperations instance and we shouldn't. And even if we would add such a dependency, we would then need to provide a new reactive Mapper for that.

I don't see that we can provide a reliable and performant solution here by using dynamic templates to solve the problem from #2774

youssef3wi commented 1 week ago

I don't see any issues here. This draft can resolve the problem, but it remains a draft due to the types of conversions.

Are you suggesting closing this draft?

sothawo commented 1 week ago

I still see the issues already mentioned:

public ClusterMapping getClusterMapping() {
        GetMappingRequest getMappingRequest = requestConverter.indicesGetMappingRequest(IndexCoordinates.of("*"));

This call might fail as the user might not have access to all indices in a cluster, for example when the cluster serves multiple users/tenants, when kibana is installed which stores data in it's own indices etc. So we cannot guarantee to have the information about an index that we need.

We need to have the index mapping information when converting an entity to a document. But it can be that the index where the document is to be stored does not exist yet. Assume we have an index template for index names like log-* which contains a mapping with a dynamic template, and the user saves some data to log-2024-09-10 and this index does not exist yet. We cannot do the mapping resepcting the dynamic template because Elasticsearch will only create that index when the document is sent to the cluster, but that's to0 late for our conversion.

We do not necessarily have the information about an index when reading response data. Let's assume we start the application and read all the index information that we can and that does not fail. To stick with the example we get the index mappings for all the log-* indices until the current date like log-2024-09-10. On the next day, the user searches for some data in log-*. Now it can be that some log ingestion mechanism has inserted something into log-2024-09-11 before the search started. This index now exists, but our application does not know about this. We then could get documents from log-2024-09-10 that we can convert and some from log-2024-09-11 where we don't have the index mapping in the cache.

As long as these issues are not resolved we cannot reliably offer this as a solution