moqui / moqui-framework

Use Moqui Framework to build enterprise applications based on Java. It includes tools for databases (relational, graph, document), local and web services, web and other UI with screens and forms, security, file/resource access, scripts, templates, l10n, caching, logging, search, rules, workflow, multi-instance, and integration.
http://www.moqui.org
Other
279 stars 200 forks source link

Proposal: Data document parent-child support #244

Closed shendepu closed 6 years ago

shendepu commented 7 years ago

This is related to elasticsearch, but it requires DataDocument entity changes.

I have a user case that there are two documents:

  1. Request
  2. Party One request only related to on Party (created by the party). In the request search list, it also shows the party info which includes some statistics. It is not good to include party info in Request index, since its statistics data updates constantly, so it is better to index separately.

This matches elasticsearch parent-child relationship. Currently I manually recreate Request index mapping to include _parent mapping to Party.

If this is common use case, I would propose:

  1. Entity DataDocument adds parentDataDocumentId which indicates which data document is the parent, parentIdField which indicates which field in child document is the id of parent data document and it must be top level field of child document
  2. moqui-elasticsearch creates the indexes of parent and child document in the same index, which is required by elasticsearch 5 for parent-child relationship
  3. Add parentType and parentIdField in document map retrieved by EntityDataDocument.getDataDocuments()
  4. In org.moqui.search.SearchServices.index#DataDocuments, index it with parent if document.parentIdField is not empty
  5. Add a includeParent parameter in org.moqui.search.SearchServices.search#DataDocuments to indicate whether includes parent document on each child document in the result.

And this proposal does not break current data document definition and services, but add new feature.

I have the implementation done. If this proposal is accepted, I can extract the code and make the PR quickly.

jonesde commented 7 years ago

This might be helpful, but in the case you described and in general the best pattern for ElasticSearch and other 'NoSQL' databases, especially document databases like ES, this is sort of like doing a join even if a very limited one.

The general idea is to look at your query (or search) needs and put everything you need for that into a single document. If you have need very different data for different searches or reports then use different documents even if there is overlap between them.

Getting a little closer to relational databases the various graph databases offer a sort of join or at least very efficient ways to walk the graph with constraints using queries. Maybe someday ElasticSearch will get into that with some sort of document reference field type and extensions to the query API/DSL. Until then the simple parent/child relationship seems really limiting and I'm not sure how helpful it will be. As I read about this feature it became clear there are some serious limitations in what you can do with this in ElasticSearch, and both the initial mappings and server/cluster maintenance is more difficult with it:

https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-parent-field.html

In this case why are you including the Party fields you mentioned that have statistics data? I'm guessing those are custom fields you have added to Party or related entities, and then included in your DataDocument definition.

As a general point of best practice with NoSQL databases like ElasticSearch and their support for aggregations it is better to calculate statistics and derived numbers on the fly than it is to re-index every time something happens that changes that statistics. With modern tools like ElasticSearch this runs fast and is way more flexible because you can define a bunch of different aggregations to run on the same data.

In the OLAP context where star schemas are common in relational databases you can do much better in the NoSQL databases with aggregation support like ElasticSearch. For example instead of using things like date/time based dimension tables and joining them to fact tables you just define the date/time based aggregations on the fly at query time instead of at data feed time. It is way more flexible.

The new MantleSalesOrderItem DataDocument is a good example of this. The general idea is to pull in everything related to a sales order item that you might want to group/bucket by or aggregate and then build any reports you need based on this one set of documents.

In other words, you calculate statistics (or aggregations in general) on the fly in this more flexible way with a bunch of detailed documents that have all relevant data included for them. Then you have more flexibility about the statistics you might want over time, and support for time based and other buckets so you can slice the data any way you like without changing the underlying data structure (the ElasticSearch document in this case).

shendepu commented 7 years ago

@jonesde thanks for the detail explanation. I have been using the dynamic calculation definition of fieldPath in DataDocument like amount = unitPrice * quantity.

In my case, I need to search child document Request with filters on parent document Party. One of filters is rating in PartyRating (this is new entity) which is calculated with complex algorithm with data (not just data in Request and Party). this is why I need to reindex the parent document Party. Since the rating is calculated in background in daily, if put the rating in Request document, then the Request will be re-indexed every day which is I want to avoid.

I understand the limitation of parent-child in elasticsearch, it is unknown when or whether elasticsearch will get into join like in Query DSL. But this simple parent-child join solves my case, since I only need to have one parent Party on Request document. (Note: this Party document is specialized for Request, it is not the general Party document like MantleParty). If someone wants to have multi parents, it can't be solved by this simple parent -child join.

jonesde commented 6 years ago

Closing as part of general cleanup for the move to HiveMind and for no recent activity on this issue. If you want to pursue or continue this feel free to create a request on moqui.org. For more information see:

https://www.moqui.org/m/docs/moqui/Community+Guide