Add more functions to mlquery service

grtjn commented 9 years ago

It is not feature complete yet, should ideally cover full structured query syntax:

http://docs.marklogic.com/guide/search-dev/structured-query#id_59265

grtjn commented 9 years ago

I have made some improvements to the original mlquery.js in the past projects. Maybe a good start to get those changes into this project.

grtjn commented 9 years ago

One major stand-out is the inability to specify an operator on range constraints.

joemfb commented 9 years ago

The more I think about this, the more I want to browserify and include https://github.com/marklogic/node-client-api/blob/master/lib/query-builder.js. I tried that awhile ago, and had some problems. I was also put off by some of the API differences, but I think it's worth sorting through. The overhead of re-implementing most of it in angular is pretty high...

grtjn commented 9 years ago

That is thorough work, but does it produce JSON structured query? Bit hard to read indeed. I would be in favor of this idea if it is relatively close to a drop-in replacement.

On the other hand, it shouldn't be much effort to extend the current builder to provide the extra features we need most, like range-operator, geo-spatial, custom-facet, and such.

joemfb commented 9 years ago

I've added a initial integration with the official query builder: aa3b4f2 (on the marklogic-integration branch).

This would definitely break backwards compatibility, which isn't necessarily an issue (with appropriate versioning). I'll need to experiment more.

grtjn commented 9 years ago

Looks promising, except that it seems to not support custom constraints. I use that regularly. Also pity it is hard to see to what extend compatibility will break. Is it possible to get some more detail on that?

grtjn commented 9 years ago

The approach to leverage the ML node client-api sounds interesting, but on second glance it looks much more different than the current query builder than I was hoping. I moved forward with the current one. How about exposing the ML node client-api based one as a second, and separate query builder, without replacing the older one. We can always decide to deprecate our own one..

joemfb commented 9 years ago

I think this needs a bigger conversation. My hope would be to use the node-client query-builder, and patch on anything it doesn't have that we need (maybe qb.ext.*). One important consideration: the node-client query-builder is pretty big, once it's browserified (shim is the current version):

$ ls -lh dist/
total 744
-rw-r--r--  1 jbryan  1258547899    14K Jul 30 16:11 ml-common-ng-shim.js
-rw-r--r--  1 jbryan  1258547899   3.7K Jul 30 16:11 ml-common-ng-shim.min.js
-rw-r--r--  1 jbryan  1258547899   260K Jul 30 16:11 ml-common-ng.js
-rw-r--r--  1 jbryan  1258547899    91K Jul 30 16:11 ml-common-ng.min.js

Note: the marklogic-integration branch has code to make the query-builder hot-swappable.

grtjn commented 9 years ago

Wow, 20k unminified, versus 260k unminified! Sounds to me as a good argument to make the node-client query-builder a separate project! ;-)

grtjn commented 9 years ago

I think a light-weight querybuilder serves us best. By the way, PR #11 includes changes to make https://github.com/joemfb/ml-search-ng/issues/26 easier. Can I do something to make that PR more compelling? Match up with the interface of node-client-api, and call this the light version of that one? (light also in terms of: limited coverage)

joemfb commented 9 years ago

I'm still kinda conflicted about this, but we have to move forward; we need additional features. I guess working for rough API compatibility with the node-client-api makes the most sense.

I'd like to separate our methods that don't exist in the official version, something like qb.ext.*. That should help future-proof if we want to switch at some point.

Do you want to give this a shot? If not, I can mockit up, based on your PR and the API from the marklogic-integration branch.

grtjn commented 9 years ago

I like the idea of having an escape with qb.ext.*. I am a little worried about backwards-compat though. Wondering if there would be a smart way of doing a back to back test against them. If it were just possible to iterate over all functions and compare results between the two implementation, that would help. It would expose any diff explicitly, and also serve as migration documentation..

joemfb commented 9 years ago

You mean backwards compatibility with the current API (and therefore older versions of ml-search-ng and existing demos)? I think we can keep backwards compatibility through deprecated methods for a release cycle (although, they're may be API conflicts in some cases).

I'm really more concerned with having forward-compatibility with the node-client-api query-builder. I don't get the impression that demo's are frequently updated, and I don't think that the query-builder is being directly used in very many demos. With that combination, as long as the query-builder is kept in sync with ml-search-ng releases, I think we can break backwards compatibility without too many problems.

Finally, I wonder if this is too complex to track in a single issue/PR. Maybe we could map out the desired API in an issue, refactor in one PR, and add methods in others?

grtjn commented 9 years ago

Yes and no. It helps get a clear picture in differences. That serves both for compatibility, as well as simply get a picture of what kind of changes are needed to make our existing QB match with that from client-api. If we come up with a smart method, we might leverage that for back-to-back regression testing as well..

We can use this issue to talk it over. And see what to do with the existing PR later. I need to get a more clear picture what is involved first anyhow.

joemfb commented 9 years ago

Here's some context to aid the discussion.

Current API:

{
  and: and(),
  boost: boost(matching, boosting),
  collection: collection(name, values),
  constraint: constraint(type),
  custom: custom(name, values),
  document: document(),
  not: properties(query),
  operator: operator(name, stateName),
  or: or(),
  properties: properties(query),
  query: query(),
  range: range(name, values),
  text: text(qtext)
}

Official API:

{
  // (un)wrapped for direct use with REST API
  where: queryWrapper,
  // imported as is
  anchor: qb.anchor,
  and: qb.and,
  andNot: qb.andNot,
  attribute: qb.attribute,
  bind: qb.bind,
  bindDefault: qb.bindDefault,
  bindEmptyAs: qb.bindEmptyAs,
  boost: qb.boost,
  box: qb.box,
  bucket: qb.bucket,
  calculateFunction: qb.calculateFunction,
  circle: qb.circle,
  collection: qb.collection,
  copyFrom: qb.copyFrom,
  lsqtQuery: qb.lsqtQuery,
  datatype: qb.datatype,
  directory: qb.directory,
  document: qb.document,
  documentFragment: qb.documentFragment,
  element: qb.element,
  extract: qb.extract,
  facet: qb.facet,
  facetOptions: qb.facetOptions,
  field: qb.field,
  fragmentScope: qb.fragmentScope,
  geoAttributePair: qb.geoAttributePair,
  geoElement: qb.geoElement,
  geoElementPair: qb.geoElementPair,
  geoOptions: qb.geoOptions,
  geoPath: qb.geoPath,
  geoProperty: qb.geoProperty,
  geoPropertyPair: qb.geoPropertyPair,
  geospatial: qb.geospatial,
  heatmap: qb.heatmap,
  jsontype: qb.jsontype,
  latlon: qb.latlon,
  locksFragment: qb.locksFragment,
  near: qb.near,
  not: qb.not,
  notIn: qb.notIn,
  pathIndex: qb.pathIndex,
  point: qb.point,
  polygon: qb.polygon,
  propertiesFragment: qb.propertiesFragment,
  property: qb.property,
  byExample: qb.byExample,
  qname: qb.qname,
  or: qb.or,
  ordered: qb.ordered,
  parseBindings: qb.parseBindings,
  parsedFrom: qb.parsedFrom,
  parseFunction: qb.parseFunction,
  period: qb.period,
  periodCompare: qb.periodCompare,
  periodRange: qb.periodRange,
  range: qb.range,
  rangeOptions: qb.rangeOptions,
  score: qb.score,
  scope: qb.scope,
  snippet: qb.snippet,
  sort: qb.sort,
  southWestNorthEast: qb.southWestNorthEast,
  suggestBindings: qb.suggestBindings,
  suggestOptions: qb.suggestOptions,
  temporalOptions: qb.temporalOptions,
  term: qb.term,
  termOptions: qb.termOptions,
  transform: qb.transform,
  value: qb.value,
  weight: qb.weight,
  word: qb.word
}

Proposed API (from your PR #11):

{
  and: and(),
  boost: boost(matching, boosting),
  collection: collection(),
  collectionConstraint: collectionConstraint(name, uris),
  constraint: constraint(type),
  custom: customConstraint(name, terms),
  customConstraint: customConstraint(name, terms),
  customGeospatialConstraint: customGeospatialConstraint(name, annotation, shapes),
  directory: directory(),
  document: document(),
  documentFragment: documentFragment(query),
  geospatialConstraint: geospatialConstraint(name, shapes),
  locksFragment: locksFragment(query),
  not: not(query),
  operator: operatorState(name, state),
  operatorState: operatorState(name, state),
  or: or(),
  properties: propertiesFragment(query),
  propertiesFragment: propertiesFragment(query),
  qtext: qtext(text),
  query: query(),
  range: rangeConstraint(name, operator, values, options),
  rangeConstraint: rangeConstraint(name, operator, values, options),
  term: term(text, weight),
  text: qtext(text),
  valueConstraint: valueConstraint(name, text, weight),
  wordConstraint: wordConstraint(name, text, weight)
}

grtjn commented 9 years ago

Don't forget the TODO's! ;-)

I took the structured query syntax ref as guideline for the implementation of the API in PR #11. The official API is much bigger actually. I think the client-api also includes the search options stuff that you can pass through. It would be interesting to cover those as well. If you look at it in that way, you would get a lot for free if we would wrap the client-api like you were proposing in your branch.

What I meant with back-to-back goes a little deeper though. A bit like expect(liteqb.and()).toEqual(mlqb.and()). I was a bit worried we would have a lot of overlap in existing functions, and have to do some thorough testing to make sure things work as expected. But if I compare existing API with official, then the existing has just 13 methods, and some don't even exist in the official (custom, operator, text). I am less worried now.

We definitely need to run some detailed tests. Doing a few back to back tests for those methods from the existing api will help, but that is just a handful. Not so difficult to cover..

The only thing I see as drawback from your approach is the size of the resulting JS. Could it be that you have wrapped much more than necessary? Can we somehow make that smaller? Or is that really as small as it can be?

joemfb commented 9 years ago

I have detailed the plan for query-builder improvements here: https://github.com/joemfb/ml-common-ng/pull/13#issuecomment-139840486, and opened #14 and #15 to support that plan. Please add comments, questions, method suggestions, etc. to those tickets as appropriate.

joemfb / ml-common-ng

Add more functions to mlquery service #1