Closed jdmintz closed 5 years ago
cc @raymondfeng @jannyHou
See https://github.com/cloudant/nodejs-cloudant/commit/d4003759b006fecb4f361527a5495ce347285ed1
@jannyHou I think we need to switch to @cloudant/cloudant
as the driver now.
Gathering information from @jannyHou and @raymondfeng, we'll need to have further investigation to see what's the work involved. It might involve one or more of the following:
@cloudant/cloudant
driver (mentioned above)I'd like to make this task as a spike first.
Discussion from the estimation meeting:
Did a quick research, we can do the following to support partitioned database:
cloudant/nodejs-cloudant
cloudant/nodejs-cloudant
so that users can leverage native APIs to create db/query records in a particular partition/etc...partitioned
as true/false, default to false to support global search<partition_name>: id
db.insert()
will give the document a random string as _id
if it’s missing in the payload. While for partitioned database, _id
is a must provide field, the random string need to be generated using 'uuid/v4', see some examples in the driver repo. Let's figure out a better UX for users to provide the id
part._id
is provided then we honor the entire string. If _id
is missing but partition
is provided, we generate a uuid and append it after the partition. If neither of them provided the request will be rejected by the cloudant service.Will update more finding and PoC asap.
Some notes:
modelName
as the default partition key, but after reading topic "a good partition key", I realized it won't be a good practice. Like new orders are better to have the value of userId
as the their partition key instead of order
. Therefore we should only honor the partition key from the request itself.partitioned index defined partitionedFind()
✔️ correct result is returned ✔️ index is used
partitioned index defined partitionedFind() sort with a field in index
✔️ correct result is returned ✔️ index is used
partitioned index defined partitionedFind() sort with a field NOT in index
✖️ error - index not found
partitioned index defined partitionedFind() sort with a field NOT in index
✖️ error - index not found
partitioned index defined partitionedFind()
✔️ correct result is returned ✔️ index is used
global index defined partitionedFind()
✔️ correct result is returned ✖️ no matched index to optimize the performance
partitioned index defined find()
✔️ correct result is returned ✖️ no matched index to optimize the performance
global index defined find()
✔️ correct result is returned ✔️ index is used
partitioned index defined partitionedFind() advanced query
✔️ $regex ✔️ nested property (e.g. address.city) ✔️ array search (e.g. $elemMatch)
Design thought see https://github.com/strongloop/loopback-connector-cloudant/issues/214#issuecomment-536586133 and https://github.com/strongloop/loopback-connector-cloudant/issues/214#issuecomment-537576508
Example: Order
model attached to Cloudant datasource.
Update driver to https://github.com/cloudant/nodejs-cloudant
Find a new db (support partition)for test
The global search and index stay the same
Order.find()
still invokes db.find()
as it is.{partitioned: false}
as global. (UX)Optimize query with partition
Order.find({<query_object>}, {partitionKey: '<name_of_key>'})
Cloudant.prototype.find()
, if options.partitionKey
is provided, then invoke db.partitionedFind('<name_of_key>', query)
options
, you can only call function with options from code.Order.find({partitionKey: 'akey', somefield: 'somevalue'})
Cloudant.prototype.find()
, if partitionKey
is detected in the query, then invoke db.partitionedFind('<name_of_key>', queryWithPartitionKeyExcluded)
partitionKey
, then the query will be broken...Maybe we can name it as lb_partition_key
as a preserved field name as a solution?db.partitionedFind()
viewDocs
and support the view search.(UX)Insert when _id is missing
db.insert()
will give the document a random string as _id
if it’s missing in the payload. While for partitioned database, _id
is a must provide field, the random string need to be generated using 'uuid/v4', see some examples in the driver repo. Let's figure out a better UX for users to provide the id part._id
is provided then we honor the entire string. id
part is missing but partitionKey
is provided, we generate a uuid and append it after the partition. Document how to call other partition APIs from connector
db.partitionInfo()
, db.partitioned.List()
, db.partitionedSearch()
. People can just get the driver instance from the connector instance and execute these native driver APIs. Awesome progress here. CouchDB 2.x doesn't support Partition Querying yet. It will be available in 3.0 (being wrapped up soon-ish according to the email group)
Can confirm that the Cloudant Developer edition was retired in favor of CouchDB containers.
Re proposal 2:
con: it's mixed with other properties, if the document just have a field called partitionKey, then the query will be broken...Maybe we can name it as lb_partition_key as a preserved field name as a solution?
Please note the filter
argument of find
method has properties where
, include
, skip
, limit
, etc. Model properties are nested under the where
field. There is no need to worry about partitionKey
clashing with model properties.
I don't know anything about Partition Queries in Cloudant/CouchDB. Purely from the LoopBack server & client perspective, I like the proposal 2 most.
Are there any security implications to be aware of? Can the partitionKey
property be exploited by a malicious client?
For a non-partitioned db, db.insert() will give the document a random string as _id if it’s missing in the payload. While for partitioned database, _id is a must provide field, the random string need to be generated using 'uuid/v4', see some examples in the driver repo. Let's figure out a better UX for users to provide the id part.
In LB3 days, we have offline-sync feature, where data is created on the client first (including an autogenerated uuid/v4
value, and then synced with the server. Can we use this feature for Cloudant/CouchDB too?
- If _id is provided then we honor the entire string.
- If the id part is missing but partitionKey is provided, we generate a uuid and append it after the partition.
- If neither of them provided the request will be rejected by the cloudant service.
Sounds good to me. Personally, I'd ask developers to configure the id
property as follows:
{
type: 'string',
id: true,
defaultFn: 'uuidv4'
}
In my limited understanding of the problem domain, this may be all that's needed to make things work with Cloudant:
id
, then we honor the value@bajtos Thank you for the detailed review and giving feedback!
Please note the filter argument of find method has properties where, include, skip, limit, etc. Model properties are nested under the where field. There is no need to worry about partitionKey clashing with model properties.
Good point! If unknown filter properties are not removed from the request(IIRC they aren't) this will definitely be a decisive reason to choose proposal 2.
Are there any security implications to be aware of? Can the partitionKey property be exploited by a malicious client?
My understanding is, partitionKey
is similar to the path parameter in an url, so it's ok to make it public.
E.g. "GET /users/{id}/orders
" VS "get all the orders with partitionKey
equals to someUserId
"
I can double check this.
I'd ask developers to configure the id property as follows:
{ type: 'string', id: true, defaultFn: 'uuidv4' }
Sound good 👍
If the client did not provide it, then we generate a unique one (irrespectively of partitioning setup)
The pattern for _id
in a partitioned db is partitionKey: id
, uuidv4
can only generate id
.
And partitioned database does NOT allow inserting a document without the partitionKey
as prefix, that's why user will have to provide at least the partition key part, or the full _id
.
Follow up stories created:
Design thought see #214 (comment) and #214 (comment)
same link both times.
In Cloudant.prototype.find(), if partitionKey is detected in the query, then invoke db.partitionedFind('
', queryWithPartitionKeyExcluded)
if partition key is provided, shouldn't it be queryWithPartitionKeyIncluded
?
While for partitioned database, _id is a must provide field, the random string need to be generated using 'uuid/v4', see some examples in the driver repo
For a partitioned database, isn't the partition key used as the id value?
If so, I am confused about this statement:
If the id part is missing but partitionKey is provided, we generate a uuid and append it after the partition.
@emonddr Thank you for the reiview!
Design thought see #214 (comment) and #214 (comment) same link both times.
If you click on the links they have different anchors, I typed the full address, the names are auto generated(converted) by github.
if partition key is provided, shouldn't it be queryWithPartitionKeyIncluded ?
Ah let me explain, db.partitionedFind()
takes in two arguments, the first is the partitionKey
as a string, the second is the rest of the query(that why I put it as query with partition key EXCLUDED)
For a partitioned database, isn't the partition key used as the id value?
Not really, the pattern of an _id
in the partitioned db is <partition_name>: id
, it consists of two parts:
Does the id related proposal make more sense now?
Had a chat with @raymondfeng , here is a summary:
Ideally, a partition key would map to a model property, like model User
has countryCode
as its partition key. This would be consistent with the behavior in loopback-connector-cassandra
. E.g.
customers = db.define('customers', {
userId: {type: Number, id: true},
countryCode: {type: String, isPartitionKey: true},
name: String,
zipCode: Number,
});
compose _id
as <countryCode>: userId
then create document by calling db.insert()
options
like {partitonKey: 'US'}
{where: {name: 'somename', countryCode: 'US'}}
db.partitionedFind('US', {selector: {name: 'somename'}})
connector parses the provided id, if it's in pattern partitonKey: id
, then invoke db.partitionedFind()
use case 1 findById('fdb2ff86-78c1-47bb-bc63-f239db06c578')
db.find()
use case 2 findById('USA: fdb2ff86-78c1-47bb-bc63-f239db06c578')
partitionedFind()
Stay the same
Developers still need to provide a full _id
with partition key and uuid when create a model instance.
We support read partition key from query options
And optimize findById
options
findById
See epic https://github.com/strongloop/loopback-connector-cloudant/issues/219 Closing this spike
MVP - epic #219 Post MVP (enhancement) - epic #222
Both Cloudant and CouchDB support partitioned databases which make querying less expensive - computationally in CouchDB, monetarily on Cloudant.
https://cloud.ibm.com/docs/services/Cloudant/guides?topic=cloudant-database-partitioning
Please consider adding this support for Loopback users.
(Updated by JannyHou):
Design thought see https://github.com/strongloop/loopback-connector-cloudant/issues/214#issuecomment-536586133 and reference in https://github.com/strongloop/loopback-connector-cloudant/issues/214#issuecomment-537576508
Proposal and follow up stories see https://github.com/strongloop/loopback-connector-cloudant/issues/214#issuecomment-537603423, we can break down the implementation into 6 stories accordingly.
Acceptance Criteria
Reference