marklogic / node-client-api

The MarkLogic Node.js Client API
https://docs.marklogic.com/jsdoc/index.html
Apache License 2.0
51 stars 45 forks source link

Retrieving element values without range indexes in MarkLogic #690

Closed dalbir closed 1 year ago

dalbir commented 2 years ago

I have the following situation in MarkLogic. I have a database with a large number of documents which each hold hundreds of fields, And I want to provide a feature where user need to sort with any attributes. I don't want to add all attributes in range index. So could I achieve this.

Important things is we required pagination.

rjrudin commented 2 years ago

I've had plenty of customers say, "We want to sort on any field". In practice though, sorting on most fields didn't make any sense. For example, if you have 1 million customers and you sort on last name, that's only going to be helpful for finding customers with last names starting with A or Z. Free text search is usually a far better approach.

Thus, I've found that sorting on date/dateTime fields to be very helpful; sorting on numeric fields to sometimes be helpful (bucket facets usually work better); and sorting on string fields to rarely be helpful. If a string field has a bounded set of values, a facet is almost always the best choice (though that of course requires a range index).

So my first recommendation is - consider which attributes of documents would really be valuable for sorting, and add range indexes to those, and you're done.

If that doesn't suffice, then you've probably already realized that sorting by a non-indexed field is not going to scale well, regardless of what database / search engine you use. The next option with MarkLogic would be to consider using the Optic API as a way of indexing values without the higher memory cost of range indexes (the Optic API will index rows via the triple index, which is not pinned into memory like range indexes are). You'd need to use something like a TDE to define all the attributes though so that ML can index rows. Given that you have hundreds of fields, that's going to be a large TDE that results in additional disk space being used for all those rows to be indexed.

The above approach is doable, but... I go back to my original point, which is that I've found it to be extremely rare for "Sort by any field" to be worth it in terms of cost/benefit. Figuring out the (likely small number of) fields that truly require sorting will likely save you a lot of time and money.

rjrudin commented 1 year ago

Closing, but please reply back if you'd like to continue the conversation about your customer's use case.