loopbackio / loopback-connector-elastic-search

Strongloop Loopback connector for Elasticsearch
MIT License
79 stars 56 forks source link

support for a dynamix index #47

Open pulkitsinghal opened 8 years ago

pulkitsinghal commented 8 years ago

Steven @onstrike07 Aug 14 17:34 I have a question about the index setting in datasource.json for this connector. Is it possible to make the index name to be dynamic? I mean I want to pass index name as a parameter from frontend for each query. Our situation is that our log data, which is very huge, are split to different indices in ES for each hour, for example, log_2016_08_10_02 index is for log data that were generated between 2:00 am and 3:00 am on August 10, 2016. Frontend code determines which index to query against based on time window users specify.

cc @onstrike07

pulkitsinghal commented 8 years ago

You use case is interesting and possible because the source code doesn't tie down the underlying driver to a specific index so it should be possible to do this.

Made an exploratory video where I try to brainstorm what someone might have to do to make this happen: https://www.dropbox.com/sh/jygk4pscosqyfel/AAAiSRhCEJeTFd-tBpETJnNOa?dl=0

pulkitsinghal commented 8 years ago

In chat I had mentioned:

Pulkit Singhal @pulkitsinghal Aug 14 17:45 to search on a new index every time doesn't have much to do with searching on different model ... to compare its like searching on a different database in mongodb, right? ... the connection url hasn't changed by much but the db has changed

So its interesting, that today I found a usecase about mongodb where the user wanted to switch out the collections accessed by the connector: https://github.com/strongloop/loopback/issues/2603

Just FYI.

onstrike07 commented 8 years ago

Based on Pulkit's video instruction, I made a workaround and it works for query for now. What I did are:

  1. In a query, pass a parameter which is at same level as 'where', as shown below: {"where":{ "user_id":999},"timescale" :1}
  2. Add a logic in ESConnector.prototype.buildFilter function to handle this parameter, as shown below.

filter.index would be overwritten if a valid timescale parameter is available, otherwise, it still uses the one in datasource.json file.


    if (criteria.timescale) {
        console.log('timescale: ' + criteria.timescale);
        var getIndexName = function (date) {
            var year = date.getFullYear();
            var getmonth = date.getMonth() + 1;
            var month = getmonth < 9 ? '0' + getmonth : '' + getmonth;
            var day = date.getDate() < 10 ? '0' + date.getDate() : '' + date.getDate();
            var hour = date.getHours() < 10 ? '0' + date.getHours() : '' + date.getHours();
            return year + '_' + month + '_' + day + '_' + hour;
        }//Builds a index name based on a date. Format: YYYY_MM_DD_HH

        var getIndexNames = function () {
            var indexNames = '';
            var queryPeriod = 0; //In hours
            var endDate = new Date(2016, 7, 10, 2, 10); //TODO: replace it with the current date in production
            switch (criteria.timescale) {
                case 1: queryPeriod = 5/60; break;
                case 2: queryPeriod = 15/60; break;
                case 3: queryPeriod = 1; break;
                case 4: queryPeriod = 6; break;
                case 5: queryPeriod = 12; break;
                case 6: queryPeriod = 24; break;
            }
            var startDate = new Date();
            startDate.setTime(endDate.getTime() - queryPeriod * 60 * 60 * 1000); //milliseconds
            if (queryPeriod < 1 && endDate.getHours() - startDate.getHours() > 0) queryPeriod++;
            console.log('EndDate: ' + endDate);
            console.log('Backward hours: ' + queryPeriod);
            for (var i = 0 ; i <= queryPeriod ; i++) {
                //TODO: check availability of the index - if it's not valid in ES, ignore it.
                indexNames += getIndexName(endDate) + ',';
                endDate.setTime(endDate.getTime() - 60 * 60 * 1000);
            }

            //Remove ending comma
            indexNames = indexNames.substring(0, indexNames.length - 1);

            return indexNames;
        }//Builds a list of index names that are separated by comma

        var indexNames = getIndexNames();
        filter.index = indexNames; //Overwrite the index name with the composed one
    }

onstrike07 commented 8 years ago

It works for query, but not for count. I'm thinking I may need to implement it in buildWhere function.

Another task is need to figure out how to validate an index in ES by using ES API.

juanpujol commented 8 years ago

@onstrike07 did you changes something on the constructor or defaults? Thanks.

onstrike07 commented 8 years ago

I only added something in the function below in esConnector.js

ESConnector.prototype.buildFilter

pulkitsinghal commented 7 years ago

@0candy, @jannyHou, @superkhau and @bajtos - I wanted to enable users of this elascticsearch connector to build "dynamic indexes" by providing a function in the datasources.<env>.js file:

module.exports = {
  'db': {
    'name': 'elasticsearch-plain',
    'connector': 'es',
    'mappings': [
      {
        "name": "Log",
        "index": function(){ // JUST LIKE MAGIC
          return 'index_a,index_b,index_c';
        },
        "properties": {...}
      }
    ]
  }
};

But regarding environment-specific configuration it says here that:

The additional files can override the top-level data-source options with string and number values only. You cannot use objects or array values.

That's a bummer ... i feel like it should be possible to use a function despite that warning, no?

pulkitsinghal commented 7 years ago

cc @onstrike07 and @juanpujol - the comment above highlights my latest plan for enabling dynamic indices. Along with a minor change to:

ESConnector.prototype.addDefaults = function (modelName) {
    ...
    // fetch index and type from `self.settings.mappings` too
    ...
    if (mappingFromDatasource) {
      indexFromDatasource = mappingFromDatasource.index;
      typeFromDatasource = mappingFromDatasource.type;
    }

    // after fetching index and type from `self.settings.mappings`
    if (_.isFunction(indexFromDatasource)) {
        filter.index = indexFromDatasource(); // generate index or indices dynamically
    }
    ...
};

It doesn't have room for the logtype variable that @onstrike07 uses in custom implementation ... but i'll try to think of how to sneak in other implementation specific stuff in a generic fashion too. Ideas are welcome.

superkhau commented 7 years ago

@pulkitsinghal I'm not sure about the ramification to supplying a function, you can give it a shot and let us know of any issues you run into. The only negative I can see is it won't work with the datasources.*.json versions.

bajtos commented 7 years ago

The additional files can override the top-level data-source options with string and number values only. You cannot use objects or array values.

That's a bummer ... i feel like it should be possible to use a function despite that warning, no?

I think using functions should be fine. Here is the source code of the type-compatibility check used by loopback-boot: https://github.com/strongloop/loopback-boot/blob/0d985bae0ba918395660850711acad32cbf964cb/lib/config-loader.js#L300-L313

@pulkitsinghal Could you perhaps send a pull request to correctthat sentence you quoted?

pulkitsinghal commented 7 years ago

@bajtos and @superkhau - thank you! And a have one follow up question about testing.

  1. Usually in connector test files, I'm setup via:

    var settings = require('./resource/datasource-test.json');

  2. In this case I could do:

    `var settings = require('./resource/datasource-test.local.js');

  3. But if I would feel a lot better if I could somehow use a more realistic mechanism for loading datasource-test.json and then datasource-test.local.js via loopback-boot ... I don't like hardcoding the test because I might miss a real problem during TDD. Is there some code for reference in another loopback project that I can lean back on?
superkhau commented 7 years ago

1+2) I think you should inject it via env vars or the way we do with rc at https://github.com/strongloop/loopback-connector-mysql/blob/master/test/init.js#L14-L24 -- this keeps things at the connector level (ie. it doesn't care about the datasource-test.json -- it takes any acceptible config value and assumes caller has already sanitized the object properly).

3) That is outside the concerns of the connector (as loopback-boot will be one loading the `.json files). If you're learning towards an end-to-end type test, you could require loopback and do something like https://github.com/strongloop/loopback-connector-kv-redis/blob/master/test/helpers/data-source-factory.js#L3 but instead of a data factory, require loopback and inject/test accordingly.

pulkitsinghal commented 7 years ago

I want to take a moment to point out the fact that @onstrike07's code used caching because of global variables being reused. Caching is something which I should give some thought to as well if the index/mapping checks are happening repeatedly.

pulkitsinghal commented 7 years ago

thank you for your earlier comments @superkhau

pulkitsinghal commented 7 years ago

@onstrike07 - it is a bit trickier than i thought:

  1. your situation is such that only reading from the indices is required but when I try to be a bit more thorough ... creating and mapping them gets tricky as i end up with something much more comprehensive, which I'm not sure anyone wants:
        {
            "name": "Log",
            "index": function(){
                return {
                    write: function(){ return ['index_a','index_b','index_c']; }
                    ,read: function(){ return 'index_a,index_b,index_c'; }
                };
            },
            "properties": {
                "message": {"type": "string", "index" : "not_analyzed"},
                "timestamp": {"type": "date"}
            }
        }
  1. ESConnector.prototype.addDefaults needs to know which CRUD operation is calling it to use either index.read() or index.write() but ofcourse it is not even that simple
  2. I feel that our connector should probably give up the responsibility for creating indices and mappings completely or provide a much clearer place to let customizers override that logic ... letting the code create them based on what addDefaults might set as an index value (as we've done in the past) is just not pretty or even feasible.
onstrike07 commented 7 years ago

I agree it's complicated. Fortunately, my project only requires query, but not the whole set of CRUD. I doubt it if it makes sense to insert a record in multiple indices in a single statement.