Open spring-projects-issues opened 7 years ago
Oliver Drotbohm commented
Can you elaborate on what the use case is for this as this is mostly considered internal API. That said, what you're suggesting exactly works the other way around: query methods follow the domain model with a PersistentPropertyPath
obtained from a MappingContext
representing the full path including the ability to translate that into a store-specific path that might include renames, just like MongoDB uses property.getFieldName()
.
Does that help?
Ales Justin commented
The problem is, if we go with fields, and not properties, PropertyPath ctor fails, as it cannot find matching property (since we pass-in a field). Converting property to a field is not a problem -- we could do it in different places (and I already have this kind of impl in place). But the problem is, Hibernate Search is field agnostic. And going with fields, it also takes care of multi-field properties, which are not rare in Hibernate Search.
My goal was to workaround the PropertyPath ctor validation in the best possible way ...
Oliver Drotbohm commented
Okay, for one: things work as they work. We can't change very core abstractions of Spring Data or extend their responsibilities, just because one store has an arbitrary requirement. Second, we have to become way more concrete if it comes to requiring changes to Spring Data. Please point to your module's code that's overly complex or describe what you'd like to achieve. Query methods are supposed to be defined based on the domain model. This not something we're going to change at any time
Yoann Rodiere commented
Hello,
The thing is, translating property names will not work in our case for two reasons:
Maybe a bit of context would help... Hibernate Search is a mapping tool for full-text search on Java entities (generally JPA entities). It allows to transparently update a de-normalized, full-text view of your original model. When querying, you usually refer to the de-normalized view (the full-text index), because each "field" in the index has a dedicated purpose, is meant to be used for a specific search query. For instance you may want to perform full-text search for auto-completion on the name of an entity in one part of your app, and for a classic search form in another part of the app. Those will require two different kinds of pre-processing when indexing, and thus two different fields. Full-text search is such that mapping two fields to a given entity property is not a corner case, either. In particular, you need two different fields in the full-text index if you want to both sort and search on a text property, because the data will have to be pre-processed differently for sorts than for full-text search (in particular, you don't want to tokenize sort keys)
Oliver Drotbohm commented
Such requirements are best implemented using manually declared queries. Be aware that the query derivation mechanism is a tool to get started quickly. It is not even intended to satisfy advanced querying facilities. Take JPA for example. There's no way we support all the joining capabilities JPQL exposes in query derivation. Also, everything that goes beyond a simple keyword on one or at max two properties just makes the method name ridiculously complex and unreadable.
Again, we simply can't change the PropertyPath
implementation as it currently implements a guarantee that code using it relies on: if a PropertyPath
instance can be successfully created, it means that the path given is valid, i.e. that all properties contained in it actually exist. Changing this would completely break this guarantee
Yoann Rodiere commented
Be aware that the query derivation mechanism is a tool to get started quickly. It is not even intended to satisfy advanced querying facilities. [...]
Ok. I guess that rules out most full-text queries, unfortunately.
Again, we simply can't change the PropertyPath implementation as it currently implements a guarantee that code using it relies on:
I'm curious, are we only talking about query code, or also code that would get/set data on actual instances?
I suppose we could introduce new interfaces to abstract the validation about "this property exists" while still preserving the current behavior for existing constructors (we'd just use an implementation of the "validator" that does the exact same thing as today). Would a PR preserving existing methods/constructors/behaviors while adding new ones be acceptable, or is this a strict no?
Oliver Drotbohm commented
Ok. I guess that rules out most full-text queries, unfortunately. Care to elaborate why?
I'm curious, are we only talking about query code, or also code that would get/set data on actual instances?
There's is metamodel API in the mapping
package. PersistentEntity
exposes a PersistentPropertyAccessor
that allows setting and getting values from entity instances. Entity creation is abstracted via the EntityInstantiator
API.
As now indicated a couple of times before, this is not how we work. We don't just arbitrarily add methods to types just because a single store implementation (or their maintainers) thinks they're needed. Also, don't forget that Spring Data Commons query execution model is an opt in model. The core interface you need to implement is RepositoryQuery
. You're free to virtually do anything you want here. We usually let store implementation just build what they need in their own. We then look at those implementations and decide what parts of that might be candidates for transfer into Spring Data Commons. But that's a last step, not a first one.
Did you have a look at other Spring Data implementations fronting search technologies like Spring Data Elasticsearch or Solr? I guess they're solving similar challenges as you do, right?
Yoann Rodiere commented
Care to elaborate why?
As mentioned earlier:
Full-text search is such that mapping two fields to a given entity property is not a corner case, either. In particular, you need two different fields in the full-text index if you want to both sort and search on a text property, because the data will have to be pre-processed differently for sorts than for full-text search (in particular, you don't want to tokenize sort keys).
So as soon as you want to be able to search and sort on the same property (even in two different queries), you'll have to use two fields, and then as you said you'll have to use manually declared queries. That's some pretty strong limitations on sorting. But yes, I suppose users could at least take advantage of derived queries when they want to use the default sort, or when they sort on a field that isn't also used in filtering clauses.
Did you have a look at other Spring Data implementations fronting search technologies like Spring Data Elasticsearch or Solr? I guess they're solving similar challenges as you do, right?
Not exactly the same challenges, actually, since they seem to assume that their entity models follow the index structure very closely, which is not our case (since the entities are typically also mapped in JPA). Also, we don't need to support writes, while they do. Regardless, I had a look at the Elasticsearch integration earlier and I couldn't find an example of "derived" queries using multi-fields, for instance (it's what you typically use to do both sorting and searching on a text field in Elasticsearch). Actually Multi-field support, while available, does not seem to be documented. But yes, I didn't spend too much time on this. I'll look harder.
Anyway, we'll have to find solutions that do not involve changing the core, that much is clear. Thanks for your patience
Ales Justin commented
Again, we simply can't change the PropertyPath implementation
What about if we just changed what Part takes as propertyPath. So that instead of PropertyPath, we would have some simple interface, e.g.:
intefrace PathHandle {
String toDotPath();
String getSegment();
Class<?> getType();
}
plus a PathHandleCreator::createPathHandle(...), which would be pushed into Part(Tree).
This would definitely work for us, and for the query creation code, while I would have to check in more details what this means for the mapping metamodel, but I think it could also easily work out -- as the model itself is already abstracted enough.
I can prepare a PR for this -- if it makes sense
Oliver Drotbohm commented
Again, please avoid prematurely creating PRs for Spring Data Commons and focus on the store implementation first. I'd love to see actual code within some context before arbitrarily adding methods or abstractions to core Spring Data abstractions. Please assume that things are in place and work a certain way for a reason. Spring Data itself has a domain model for the domain of implementing repositories, query methods etc. That domain model implements domain constraints and deliberately weakening those constraints or adding new model elements is not something we take lightly ("Hey, let's add this interface!") as there already is a lot of complexity involved and every addition and tweak increases that.
Regarding the suggested interface: how are you planning to obtain the type if there's no (canonical) property on a reference type you could derive that type from?
What's actually wrong with creating a dedicated (read-only) model to map the search index on? That's very much in line with CQRS semantics where a primary (JPA) command model is accompanied by an eventually consistent read model that can have a different structure optimized for particular queries. This sounds way more matching your problem space than trying to unify everything into a canonical model
Ales Justin commented
I'd love to see actual code within some context before arbitrarily adding methods or abstractions to core Spring Data abstractions.
https://github.com/snowdrop/spring-data-hibernate-search
But this currently maps against properties, and not (yet) against fields.
Regarding the suggested interface: how are you planning to obtain the type if there's no (canonical) property on a reference type you could derive that type from?
In most cases we could find the property that owns that field. If not, I think we would still be OK.
What's actually wrong with creating a dedicated (read-only) model to map the search index on? That's very much in line with CQRS semantics where a primary (JPA) command model is accompanied by an eventually consistent read model that can have a different structure optimized for particular queries. This sounds way more matching your problem space than trying to unify everything into a canonical model.
Which model are we talking about here?
My initial problem is that I don't want to re-write / copy-paste PartTree's logic. And with current impl, I cannot work around PropertyPath validation w/o changing the code.
Oliver Drotbohm commented
In most cases we could find the property that owns that field. If not, I think we would still be OK.
That feels like a very week abstraction then as you basically don't add any real guarantees on top of a simple dot-separated String
do you.
Which model are we talking about here?
The user's code. With a search index in place next to a primary backing store, I'd definitely recommend to have a dedicated entity for the primary store and a completely separate domain type and repository to access the index data so that that domain type can actually be mapped much closed to the structure of the index data
Ales Justin commented
That feels like a very week abstraction then as you basically don't add any real guarantees on top of a simple dot-separated String do you.
Well, you need to find the field from the Part on your entity model.
Wrt the type, I don't see how that is relevant to the query? Do you match parameters against it?
The user's code. With a search index in place next to a primary backing store, I'd definitely recommend to have a dedicated entity for the primary store and a completely separate domain type and repository to access the index data so that that domain type can actually be mapped much closed to the structure of the index data.
I don't see why you would need two? You have one, which has both info; e.g. JPA and HS.
It's just that you in HS repository case, you map queries against HS info (fields), whereas in JPA, you map it against props.
I do agree with you that this should only handle simple cases (for the good reasons you mention in some previous comment), where the complex mappings should be handled differently -- yet to think of a good way ...
Yoann Rodiere commented
That feels like a very week abstraction then as you basically don't add any real guarantees on top of a simple dot-separated String do you.
We could have such guarantees, we have our own internal model. The model just isn't a java Class, so the current PropertyPath constraints cannot work.
The user's code. With a search index in place next to a primary backing store, I'd definitely recommend to have a dedicated entity for the primary store and a completely separate domain type and repository to access the index data so that that domain type can actually be mapped much closed to the structure of the index data.
Sure, it could work that way. It's just not how Hibernate Search works. Users just annotate their primary entity, build queries targeting the target model, and get results in their primary model. Having separate class definitions for the index model would just be overhead, since those classes would never be instantiated.
In any case... I'm explaining that since we're talking about it, but you already said nothing will change in the core unless we provide a concrete, separate implementation. So I guess we can close this ticket... ?
Ales Justin commented
I quickly hacked this:
And it works for me. (OK, JPA tests fail for unknown reason, but others work :-))
Ales Justin commented
So I guess we can close this ticket... ?
We're discussing things, so "don't throw your gun in the corn yet", as we say back home. :-)
Oliver Drotbohm commented
Well, you need to find the field from the Part on your entity model. Wrt the type, I don't see how that is relevant to the query? Do you match parameters against it?
My point is: we usually introduce such abstractions to be able to implement guarantees the using code can then rely on. If you create an instance of PartTree
you either get an exception or you can be sure that every segment your source string contained is really available. That allows you to detect errors. That allows you to prevent other code being executed that actually doesn't make sense to be executed unless that guarantee is met. Code that doesn't produce these guarantees is usually way less helpful and leads to much more complex code using it in turn as every client will again have to verify: Does that segment actually match to something reasonable? Is the null
or does it actually return something usable? I can only recommend to watch this talk about how proper and strong domain abstractions lead to more maintainable and secure code. That's what I was referring to with my "Spring Data has a domain model as well" before.
I don't see why you would need two?
Because that would dramatically simplify the developers life and avoid us having to have to have this discussion :). You're trying to make one thing fit two different purposes. Not only this creates complexity, it also usually leads to wrong common abstractions. It's SRP in practice, basically. Yes, you can overload a class with responsibilities but it will bite you in the end.
It might also be worth taking a step back and think about what a repository is supposed to be in the first place. It's not some arbitrary data access API, it's about simulating a collection of aggregates and the ability to select subsets of those. That however requires aggregates in the first place and a way to properly model them. You've spent quite a couple of comments to explain to me, why things are different with HS and the entity HS derives its data from. Concluding those explanations with "I don't see why a singular approach doesn't work" bears a bit of irony then ;). If it's different, then properly model a separate view
Ales Justin commented
That allows you to detect errors ... Code that doesn't produce these guarantees ...
I'm still able to detect errors and produce guarantees -- if Part's source is not found in domain's HS fields.
If it's different, then properly model a separate view.
That's exactly not the point -- as you want to have the same model / domain class, so it can be shared between different contexts; e.g. JPA and HS --> both in same transactional / is-dirty context, mix-n-match how you use found entities Or Infinispan (data store) and HS --> either cached or queried
Ales Justin commented
if Part's source is not found in domain's HS fields.
That's with my latest patch -- not with current Part code.
Michael Simons commented
I find this discussion as a user of both Spring Data JPA and Hibernate Search pretty interesting. As a user I understood that I have a write model (command model) for both a relational database and Lucene (or Elastic Search) through one type of entity classes. The write model is apparently not always the same, as a property can be mapped to different fields at the same time.
Using the write model through Spring Data JPA executes the command in both systems and I only have one interface: For me, the "selling point".
What I didn't understand from the discussion: A Spring Data Hibernate Search modul, should it be used in addition to Data JPA? If so, it would add burden, at least I get the impression.
Separate read models like described in the comments, i.e. sorting and searching on different fields, can easily be added to a Spring Data JPA repository as well, like I do here. Also that would be "my" extension point of adding projections and stuff
Yoann Rodiere commented
What I didn't understand from the discussion: A Spring Data Hibernate Search modul, should it be used in addition to Data JPA? If so, it would add burden, at least I get the impression.
You would handle writes as usual, caring only about JPA entities (using Spring Data JPA or anything else), and indexes would still be updated transparently. But when querying, you would have the additional option to make Spring Data generate the implementation of a read-only, Hibernate Search repository from an interface with properly named methods (the feature is called "derived queries").
Separate read models like described in the comments, i.e. sorting and searching on different fields, can easily be added to a Spring Data JPA repository as well, like I do here. Also that would be "my" extension point of adding projections and stuff.
Yes, I think that's what Oliver is suggesting: apparently we shouldn't try to implement complex stuff with the derived queries, and rather implement "complex" queries (basically anything targeting a field that isn't named after its source property) like you did, using "manually declared queries". Otherwise we'd have to rewrite a good part of Spring Data's Core in our own integration, because Spring Data currently assumes that the "query" model (the one filtering clauses refer to) is the same as the "result" model (the one for returned entities), which isn't our case in Hibernate Search.
Oliver Drotbohm commented
I like Michael's solution quite a lot as it keeps the advanced functionality where it belongs: inside the repository. It also shows that even a simple search query doesn't actually nicely align with the query derivation feature. And as those queries apparently don't make up the majority of queries of the repo, it's okay to manually implement them. You still gain a lot by not having to implement boilerplate persistence.
Another aspect that I just thought of was the following: if you're suggesting to create a second repository interface also tied to a domain type already managed by a primary data store, that's going to cause quite a bit of trouble with downstream technology like Spring Data REST, as it basically needs to find a canonical repository by domain type.
So maybe Michael's approach is just fine and it doesn't actually need a dedicated module?
Ales Justin commented
I like Michael's solution quite a lot as it keeps the advanced functionality where it belongs: inside the repository.And as those queries apparently don't make up the majority of queries of the repo, it's okay to manually implement them. You still gain a lot by not having to implement boilerplate persistence.
I agree with this for the (very) complex queries. But for the simple ones, I shouldn't and I don't care about HS / Lucene, I just care that my queries do full-text search. Otherwise you could say the same for every simple query / repo out there, either JPA, etc
It also shows that even a simple search query doesn't actually nicely align with the query derivation feature
It's the Part(Tree) API that doesn't (nicely) align with this derivation feature. Imo, this is still a valid derivation feature == hide full-text search functionality behind simple query method naming rules
And as those queries apparently don't make up the majority of queries of the repo, it's okay to manually implement them.
I would say just the opposite, imo, you normally do full-text search on a few fields (1-2), hence the queries are still simple. Combine this with "I shouldn't and I don't care about HS / Lucene", and you have a valid request for such a derivation feature.
Another aspect that I just thought of was the following: if you're suggesting to create a second repository interface also tied to a domain type already managed by a primary data store, that's going to cause quite a bit of trouble with downstream technology like Spring Data REST, as it basically needs to find a canonical repository by domain type.
This is already a problem with current Spring Data repo injection. (where this could be made a bit more configurable / abstracted) But that's a topic for another thread ...
So maybe Michael's approach is just fine and it doesn't actually need a dedicated module?
So you're saying that every user should have its own version of such code, with all sorts of different queries:
ALL = Arrays.asList(IS_NOT_NULL, IS_NULL, BETWEEN, LESS_THAN, LESS_THAN_EQUAL,
GREATER_THAN, GREATER_THAN_EQUAL, BEFORE, AFTER, NOT_LIKE, LIKE, STARTING_WITH, ENDING_WITH, IS_NOT_EMPTY,
IS_EMPTY, NOT_CONTAINING, CONTAINING, NOT_IN, IN, NEAR, WITHIN, REGEX, EXISTS, TRUE, FALSE,
NEGATING_SIMPLE_PROPERTY, SIMPLE_PROPERTY)
instead of existing module that already handles all of this ...
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.
Ales Justin opened DATACMNS-1176 and commented
PropertyPath from Part(Tree) doesn't always directly map class's structure. e.g. in Hibernate Search (HS) Spring Data implementation we map class's HS fields
@Field
(name = "foo") String bar;public List\ findByFoo(String foo);
In current impl, this use case fails. Adding a mapper, we could be less restrictive on the actual "Part" usage. e.g. not just strictly class's properties / hierarchy
No further details from DATACMNS-1176