Closed measwel closed 6 years ago
@measwel for me the index is actually created and I didn't face any errors while searching, though the location type went into the index as a nested object instead of a geo_point type. May be you are accessing the model instances before they the getting initialized. I'm sharing my code here. Though i'm using Async suffixed promise based functions here, the callback based methods also does work. But the geo_point type defined in the properties is not having any effect on the elassandra index. Looks like it's an elassandra issue as the index properties is transferred to elassandra as is from express-cassandra.
index.js
var models = require('express-cassandra');
//Tell express-cassandra to use the models-directory, and
//use bind() to load the models using cassandra configurations.
models.setDirectory( __dirname + '/models').bind(
{
clientOptions: {
contactPoints: ['127.0.0.1'],
protocolOptions: { port: 9042 },
keyspace: 'mykeyspace',
queryOptions: {consistency: models.consistencies.one},
elasticsearch: {
host: 'http://localhost:9200',
apiVersion: '5.5',
requestTimeout: 1000 * 60 * 60,
sniffOnStart: true,
}
},
ormOptions: {
defaultReplicationStrategy : {
class: 'NetworkTopologyStrategy',
dc1: 1
},
migration: 'alter',
manageESIndex: true,
udts: {
geo_point: {
'lat': 'float',
'lon': 'float'
},
}
}
},
function(err) {
if(err) throw err;
console.log('model sync done!');
var Person = models.instance['Person'];
var person = new Person({
useruuid: models.uuid(),
name: 'Masum',
age: 32,
location: {
lat: 23.233441,
lon: -90.244535
},
created: { $db_function: 'toTimestamp(now())' }
});
person.saveAsync()
.then(() => {
return Person.searchAsync({q: 'name: Masum'});
})
.then((response) => {
console.log(response.hits.hits);
models.close();
})
.catch((err) => {
console.log(err);
models.close();
});
}
);
models/PersonModel.js
module.exports = {
fields:{
useruuid: 'uuid',
name : 'text',
age : 'int',
location: {
type: 'frozen',
typeDef: '<geo_point>'
},
created : 'timestamp'
},
key:['name', 'created'],
clustering_order: { created: 'desc' },
es_index_mapping: {
discover: '.*',
properties: {
name: {
"type": "text",
"index": true
},
location: {
"type": "geo_point"
}
}
}
}
The problem is:
When you specify a field in the cassandra table, this field gets mapped to the index and ES properties are ignored! That is why our location field gets mapped as a nested object.
When you omit the field in the cassandra table, the field gets mapped from the properties and a field is ALSO created in the cassandra table. Try omitting location field in table definition. Then you will get a proper geo_point field in the index and in the table you will get: location list<frozen
I am trying to figure out how to deal with this. For description field if I omit it in table definition, I get description list
Another problem I am facing is that elastic keeps dropping connections: HEAD http://localhost:9200/myindex => socket hang up
If anybody knows how to remedy this problem, I would like to hear it.
I currently have a table in the db with the following fields:
CREATE TABLE arbotti.post (
...
description text,
descriptionkeywords list<text>,
location list<frozen<geo_point>>,
...
PRIMARY KEY (useruuid, created)
) ...
Which is the result of this mapping:
properties: {
"descriptionkeywords": {
"type": "text",
"index": true
},
"location": {
"type": "geo_point"
}
}
The question is: how do I insert into these cassandra fields:
descriptionkeywords list<text>
location list<frozen<geo_point>>
This is the data I provide to be saved in the cassandra table:
{ ...
description: 'Some text',
descriptionkeywords: [ 'Some text' ], <-- array with 1 text value
location: [ { lat: 52.582014799999996, lon: 13.4913122 } ] <-- array with 1 geopoint
}
But the fields descriptionkeywords and location are not saved in the cassandra table. They are also not saved in the ES index.
I believe that the mapping is correct. The location field has to be a true geo_point in the ES index, otherwise geospatial search will not work. The same is true for the text description which has to be analyzed text. The fields are there, but I cannot insert into them with the cassandra-express driver. I would like to avoid writing to the index directly if possible.
I have thought about all this and I seriously consider not using elassandra. I need to be able to perform full text search and geospatial search and that is the only reason for wanting the ES index in the first place.
I am considering to implement full text search with a SASI index in cassandra and geospatial search by using geohashes, also in cassandra. If I can do this, then I will have no need for ES and I can use only the cassandra DB with the excellent cassandra-express driver ( for which I thank you from my hearth ! ).
UPDATE 1:
const query = `CREATE CUSTOM INDEX post_description_idx ON arbotti.post(description)
USING 'org.apache.cassandra.index.sasi.SASIIndex'
WITH OPTIONS = {
'mode': 'CONTAINS',
'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
'analyzed': 'true',
'tokenization_enable_stemming': 'true',
'tokenization_locale': 'en',
'tokenization_skip_stop_words': 'true',
'tokenization_normalize_lowercase': 'true'
};`;
Works fine on the latest bitnami/cassandra image. :) Looks like full text search will be possible.
Now for the geospatial search...
Good day,
I am facing a problem when using the ORM cassandra-express. Each time I restart the server, the driver runs queries to create tables and indexes, even if the data model has not changed. These queries doe not work if I do not drop the keyspace beforehand. Should the driver really execute create queries if the model has not changed? I have tried different values for migration, but they seem to have no effect. How can I make sure the queries do not run when not needed?
When you restart your app, the way for express-cassandra to know whether anything has changed is to query cassandra to read the current db state and match that state with your model schema. If it finds nothing to have changed, then it doesn't do any migration. So it shouldn't run create queries if nothing has changed. It should only run some select queries against the system tables to read the current db state.
Could you share your case where it's happening? I mean the model schema and how did you initialize it?
The problem could be in my code. I see now that my code is not executing, because I emit an event only when the database is updated. When the db is not updated, the event is not emitted and the rest of my code does not run. Is there a way to test whether the update has taken place or not?
The function I use looks like this:
module.exports.updateDatabase = function (messageBus) {
logger.log('info', 'Starting arbotti database update');
models.setDirectory(path.join(__dirname, 'models')).bind(
{ ... and so on ...
So you want to programatically test whether any db schema was updated in the model sync operation? Right now express cassandra does not pass this information into the schema sync callback, but it should be fairly straight forward to implement that in express-cassandra. I'll try to make the necessary changes and include the updates in the next release.
Coming to think of it; it would be useful to know if a db schema was updated, but the foremost requirement for me is to know when the routine has finished executing (updating the db or not). When I run this:
models.setDirectory(path.join(__dirname, 'models')).bind(
{
clientOptions: {
contactPoints: [config.contactPoints],
protocolOptions: {port: config.cassandraPort},
keyspace: config.keyspace,
queryOptions: {consistency: models.consistencies.one},
authProvider: new models.driver.auth.PlainTextAuthProvider(config.cassandra_user, config.cassandra_password)
},
ormOptions: {
// If your keyspace doesn't exist it will be created automatically
// using the default replication strategy provided here.
createKeyspace: true,
defaultReplicationStrategy: {
'class': 'NetworkTopologyStrategy',
'DC1': 1
},
migration: 'alter'
}
},
function (err) {
if (err) {
logger.log('error', 'Error updating database!', err);
} else {
messageBus.emit('database_updated'); // send message that db is ready
}
}
);
The event signalling the end of the db update operation is never emitted, because the whole migration routine is not run if there are no db changes. I wonder if there is another way to find out when the routine above has finished executing.
Dear Masum,
Oftentimes when the schema has changed and the database update routine is executed I get the following error:
stty: when specifying an output style, modes may not be set /node_modules/readline-sync/lib/read.sh: 49: /node_modules/readline-sync/lib/read.sh: cannot create /dev/tty: No such device or address stty: 'standard input': Inappropriate ioctl for device Error: The current environment doesn't support interactive reading from TTY. stty: when specifying an output style, modes may not be set
When faced with this error, I need to drop the schema and rerun the update routine. Is it known what causes this problem and is there a remedy?
Any schema migration that requires deletion of data will be confirmed from the user by asking a confirmation in the terminal. This is an extra safety feature to avoid accidental deletion of data.
Note that some environments might not support tty console, so asking the user for confirmation in the terminal may throw errors. You can set the property disableTTYConfirmation: true
in the ormOptions. This will do the migrations without asking for a confirmation from you.
Hello Masum,
What would be the best way to implement transparent db field encryption in the free version of cassandra? Ideally, I would like to encrypt a field automatically when writing it to the db or querying for it and decrypt it automatically when reading it.
Should I do this on application level, or is there a better way?
To handle read/write you could use a virtual field to alias your original db field to do the encryption things:
// PersonModel.js
module.exports = {
fields: {
name_encrypted: "varchar",
name : {
"type": "varchar",
"virtual" : {
get: function() {
return your_decryption_logic(this.name_encrypted);
},
set: function(name) {
this.name_encrypted = your_encryption_logic(name);
}
}
}
}
}
This setup will enable your codebase to access name
as a virtual field. This virtual field will get/set from/to the original db field name_encrypted
and keep your name field encrypted in db while your code can access name
field as an alias which is not saved to db.
To query against the name field you could do something like this:
models.instance.Person.find({ name_encrypted: your_encryption_logic(value) }, function(err, person){
});
Thank you very much Masum !
You're welcome. Closing this thread as it already started to contain discussions not directly related to the title. Let's create a new issue / stack overflow question to discuss another topic.
I love this approach, thanks!
In the docs for elassandra, the following is mentioned for mapping a geo_point:
Elasticearch Types | CQL Types | Comment geo_point | UDT geo_point or text | Built-In User Defined Type (1)
The references (1) and (2) in the documentation seem to be reversed. See: http://elassandra.readthedocs.io/en/latest/mapping.html?highlight=geo_point
For the time being, I have implemented the following:
And in my model I have:
The index is created, but the mapping is not right. The description field in the index is mapped to a keyword instead of analyzed text. The location field is also not mapped to a geo_point like specified.
UPDATE:
It seems that the mapping is created correctly, if I omit the fields in the cassandra table and only specify them in the es index properties object. These fields then also get created in the cassandra table. This is confusing, as the documentation states:
This suggests, that you can have fields specified for the cassandra table AND overwrite them in the properties specification. But in my case, the mapping only works when I specify the fields only in the properties for elastic search.