Open vicb opened 3 years ago
Thanks for opening the issue! What would be an ideal solution on our side? Running Math.floor()
on the float? Throw an error, specifying only an integer is expected?
The ideal solution is that .filter('integer_column', '>', value)
works whatever the (numerical) value.
I mean 10 > 5.123 == true
is something everybody expects to work, whatever the language/API.
Edit: The same way, I also expect .filter('float_column', '>', 10)
to work and I don't have to type .filter('float_column', '>', 10.0)
The issue I'm seeing is that the API expects an integer value to compare against integers. It cannot compare floats against integers. So, if we're given a float of 123.456
, we would have to lose precision to reduce it to 123
. Of course, the results could never include a result where integer_column
contains 123.456
, so a result with 124
will come through in both cases, as that's the next possible value. What I'd be concerned about, is that we'd be adding code that could be misleading-- having the user assume it's possible to have float precision where they cannot. It may be best to continue sanitizing non-integer input in the application layer, to prepare it for the integer operation.
1) Again everybody expects 10 > 5.123 == true
. The opposite is very misleading.
2) Thoughts:
So, if we're given a float of
123.456
, we would have to lose precision to reduce it to123
You have to be careful about the rounding here:
value > 123.001
is equivalent to value > 123
but value < 123.001
is equivalent to value < 124
While it makes sense to round for >
, >=
, <=
and <
it is reasonable to say that 123 = 123.456 == false
(i.e. do not apply rounding for equality).
An other question I would have is at what level the type is stored ?
From the cloud console UI it seems like the type is store at entity level. i.e. 2 entity might have different types for the same column.
type(entityA.some_col) == integer
while type(entityB.some_col) == float
Is this true ?
If it is the API should offer a way to pass a float value otherwise you would have to do 2 queries (one with int and one with float) to do something like some_col > 123.001
.
I honestly never asked myself the question as I never set the type of the column explicitly. I just saved entity using the API. I only noticed this strange behavior when doing some tests and creating entity via the console.
edit: fixed rounding for >
and <
Note: Other language clients have similar behavior.
from google.cloud import datastore
client = datastore.Client()
kind = "test"
id = "test_id"
key = client.key(kind, id)
entity = datastore.entity.Entity(key)
entity['field'] = 'field_value'
entity['integer_column'] = 200
client.put(entity)
query = client.query(kind=kind).add_filter('integer_column', '>', 123.3)
results = list(query.fetch())
print(f"Query comparing an integer and float (none): {results}")
query = client.query(kind=kind).add_filter('integer_column', '>', 123)
results = list(query.fetch())
print(f"Query comparing an integer and integer (one result): {results}")
Query comparing an integer and float (none): []
Query comparing an integer and integer (one result): [<Entity('test', 'test_id') {'field': 'field_value', 'integer_column': 200}>]
Similar behavior in Java:
@Test
void numberSaving() {
final Datastore ds = datastoreOptions.getService();
final String kind = "valueTypeTesting";
final String long_property = "long_property";
final String double_property = "double_property";
final String number_property = "number_property";
KeyFactory txTesting = ds.newKeyFactory().setKind(kind);
final Entity a = Entity.newBuilder(txTesting.newKey("A"))
.set(long_property, 20)
.set(double_property, 20.0)
.set(number_property, 30.0)
.build();
final Entity b = Entity.newBuilder(txTesting.newKey("B"))
.set(long_property, 10)
.set(double_property, 15.0)
.set(number_property, 18)
.build();
ds.put(a);
ds.put(b);
final EntityQuery qInt = Query.newEntityQueryBuilder()
.setKind(kind)
.setFilter(PropertyFilter.gt(number_property, 5))
.build();
final EntityQuery qDouble = Query.newEntityQueryBuilder()
.setKind(kind)
.setFilter(PropertyFilter.gt(number_property, 5.0))
.build();
final QueryResults<Entity> qIntRun = ds.run(qInt);
LOGGER.info("qIntProto = {}", protoToString(getQueryProto(qInt)));
while (qIntRun.hasNext()) {
Entity next = qIntRun.next();
LOGGER.debug("entityProto = {}", protoToString(entityProto(next)));
if ("A".equals(next.getKey().getName())) {
assertEquals(30.0, next.getDouble(number_property));
} else if ("B".equals(next.getKey().getName())) {
assertEquals(18, next.getLong(number_property));
}
}
final QueryResults<Entity> qDoubleRun = ds.run(qDouble);
LOGGER.info("qDoubleProto = {}", protoToString(getQueryProto(qDouble)));
while (qDoubleRun.hasNext()) {
Entity next = qDoubleRun.next();
LOGGER.debug("entityProto = {}", protoToString(entityProto(next)));
if ("A".equals(next.getKey().getName())) {
assertEquals(30.0, next.getDouble(number_property));
} else if ("B".equals(next.getKey().getName())) {
assertEquals(18, next.getLong(number_property));
}
}
}
qIntProto = { query { kind { name: "valueTypeTesting" } filter { property_filter { property { name: "number_property" } op: GREATER_THAN value { integer_value: 5 } } } } }
entityProto = { key { partition_id { project_id: "default-223119" } path { kind: "valueTypeTesting" name: "B" } } properties { key: "double_property" value { double_value: 15.0 } } properties { key: "long_property" value { integer_value: 10 } } properties { key: "number_property" value { integer_value: 18 } } }
entityProto = { key { partition_id { project_id: "default-223119" } path { kind: "valueTypeTesting" name: "A" } } properties { key: "double_property" value { double_value: 20.0 } } properties { key: "long_property" value { integer_value: 20 } } properties { key: "number_property" value { double_value: 30.0 } } }
qDoubleProto = { query { kind { name: "valueTypeTesting" } filter { property_filter { property { name: "number_property" } op: GREATER_THAN value { double_value: 5.0 } } } } }
entityProto = { key { partition_id { project_id: "default-223119" } path { kind: "valueTypeTesting" name: "A" } } properties { key: "double_property" value { double_value: 20.0 } } properties { key: "long_property" value { integer_value: 20 } } properties { key: "number_property" value { double_value: 30.0 } } }
I think the root of the problem is, at query filter creation time, the client has no knowledge of the type of the field being filtered. the type needs to be figured out from the type provided. So, as a user you can round. I don't think the client can do that without side effects.
Thanks for checking other clients.
I don't think the client can do that without side effects.
May be it should be done server side then ?
In the meantime it would probably be worth documenting this behavior. I think we can agree to say that this behavior is really unexpected ?
The concern is expressed in my former message is real.
Let's init a table with [0...9] / 2:
for (let value = 0; value < 10; value++) {
const key = datastore.key('TEST');
await datastore.save({
key,
data: {
value: value / 2,
}
})
}
You can go to the console and you will see that 0,1,2, 3, and 4 are integer 0.5, 1.5, 2.5, 3.5, and 4.5 are floats.
Sanity check:
const [allEnt] = await datastore.createQuery('TEST').order('value').run();
console.log(`allEnt -> ${allEnt.map(e => e.value)}`);
// allEnt -> 0,1,2,3,4,0.5,1.5,2.5,3.5,4.5
Query with an integer value
const [intEnt] = await datastore.createQuery('TEST').filter('value', '>', 3).order('value').run();
console.log(`intEnt > 3 -> ${intEnt.map(e => e.value)}`);
// intEnt > 3 -> 4,0.5,1.5,2.5,3.5,4.5
Ouch !
0.5,1.5,2.5,
should not be here,Query with float value:
const [floatEnt] = await datastore.createQuery('TEST').filter('value', '>', 2.5).order('value').run();
console.log(`floatEnt > 2.5 -> ${floatEnt.map(e => e.value)}`);
// floatEnt > 2.5 -> 3.5,4.5
Ouch again !
Maybe the server API is fine for typed language (I haven't checked the client) but it is definitely not for JS (and probably Python).
@stephenplusplus @crwilcox @BenWhitehead - I think we can agree to say that is behavior is bug prone and should be fixed ?
I am not sure I would consider this a bug, but more a result of us trying to make things ergonomic in libraries. Filters must specify, via proto, the value type that the filter is evaluating.
So, when you set a filter, it is converted to a proto, and the type is picked at that point. You can see this in @BenWhitehead response if you look at the qProto.
{
query {
kind {
name: "valueTypeTesting"
}
filter {
property_filter {
property { name: "number_property" }
op: GREATER_THAN
value { double_value: 5.0 }
}
}
}
}
With that said, I understand how as a user it seems that, perhaps, the backend could recognize that an int_value: 10
is greater than double_value: 5.0
@crwilcox how is this not a bug.
JS has number
s not int
or float
.
If you look at my example I save some number
s and because some happen to be ints then there is no way to get back a correct result without having to make 2 queries (1 for the integer values and 1 for the floating points).
So basically if you have a product entities with a price, some prices will be integer ($10) and some will be float ($9.99). You would need two queries + some post-processing (for orders and limits) to get a correct result if you want a list of products with price < $15.
So, when you set a filter, it is converted to a proto, and the type is picked at that point. You can see this in @BenWhitehead response if you look at the qProto.
This seems to indicate that the bug should be fixed server side. Again if you look at my examples the same code could result in entities having either an integer or a floating point field. So you do not know the type at query time - or more exactly you known that there could be 2 different types.
@vicb I think what might be missing is that we have to account for folks using other languages, accessing the database not just from node. For instance, since JS has no separationg of int, float, we can do one of a couple things.
As best I can figure, we have three options on the client side (see later for backend discussion):
1) The Node.js Client always converts number to int in datastore
2) The Node.js Client always converts number to float in datastore
3) Current Behavior: Through inference, decide if the given number is best reflected by int, or float value, within the database.
As for backend, I agree it seems this may be addressable. A discussion could be started around allowing the backend to filter across disparate types, but I am not certain this would be straightforward, but there is no reason not to have that conversation. Though as that is no longer about the Node.js client, I would ask we continue that discussion on the appropriate issue tracker with the backend folks if possible.
If you believe I have missed an option for the client I would be happy to discuss it further, it is quite possible I am not thinking outside the box enough 😄
Hey @crwilcox,
I would add the following the client side options (1-3 below refer to the options in your message).
1) I am not exactly what you mean here. If you mean remove any support for floating point from the Datastore, I don't think this is a viable option. If you mean converting number to int for queries (as suggested by @stephenplusplus above) then it would still not solve the floating point issue.
2) Treating all number as floating point. It could work with JS because ints are encoding on 53bits only. Not sure about other dynamic langage. But there would still be a few problems on top of interoperability:
3) Current Behavior As you noted ("sticking to int or double for a value. I understand that this isn't a node concept.") and as far as I know there is no way to force a number to be a float in JS (only integer can be forced as far as I know).
4) Maybe one solution would be to change the API to have the user specifying the type. But again it will break legacy DB that have mixed integer/float for the same field.
In the end I don't really see a way to fix that on the client side.
As for backend, I agree it seems this may be addressable. A discussion could be started around allowing the backend to filter across disparate types, but I am not certain this would be straightforward, but there is no reason not to have that conversation. Though as that is no longer about the Node.js client, I would ask we continue that discussion on the appropriate issue tracker with the backend folks if possible.
It sounds like a great idea please include me in the discussion.
Thanks !
Reading the docs, it looks like this behavior is actually documented
TL;DR Integers are sorted before FPs.
I think this makes working with FP values in JS (and probably other languages) 1) hard - you would have to issue at least 2 queries to get all numerical values > or < to a limit (one for int, one for fp), 2) inefficient - querying for fp_colum < value will dump all the integer values. Similarly for int_colum > value dumping all the fps.
Adding here after chatting a bit further. Because of the inference of the client, it is hard to only insert doubles. If you want to enter values 1,2,3.99, there isn't a way, through this method, to make sure these are all doubles.
At the same time, I think users can send the value as a custom type: https://github.com/googleapis/nodejs-datastore/blob/master/src/entity.ts#L62
This would ensure the typing of the column. I wonder if there shouldn't be a warning on Number type autoconversion? In the case that you only ever use ints, it is probably fine, but it seems difficult to avoid doubles with .00
fractional.
@vicb would it be fair to characterize the status of this as the following:
As a user you would like to see the untyped entries to not be the preferred way via node as it is difficult (not feasible) to keep type consistency, for example the field values (1.99, 3.99, 4.0 (which will be seen as an int)).
By not preferred I think there are a few solutions
The final option is the most extreme to be sure and would mean a major version revision.
What do you think is necessary here as a user? Also, seeing as you seem to have done a good deal of homework on this, are there areas you can point us to in the docs where you think improvements should be made?
Yes so the whole problem is that in JS Number.IsInteger(4.0) === true
.
This basically makes the main API of datastore unusable for float numbers as you can not rule out some will be considered ints.
// The behavior of this is counter intuitive as soon as you query the value field
datastore.save(
key,
data: {
value: 1.23
}
};
// The behavior of this is counter intuitive when you deal with a float field
// For the same reason limit and order have counter intuitive behaviors.
datastore.createQuery('Product').filter('price', '<', 1.23);
To answer your points:
1) IMO updating the docs is not enough, because:
That being said, if this should be documented, there should be some examples to make the counter intuitive behavior clear.
2) The only warning I can imagine would be emitted every time a user saves a numerical value. It might be useful when there is a clear path to remove the current limitation but I am not sure if it would be helpful before that.
3) If you decide to fix this issue on the client side (also applicable to other dynamically typed languages) then I think deprecating and removing the auto-conversion is a required step. It means that every single client application must be updated (as long as they are dealing with numbers). Note that it would still not solve the problem with values created before the major version - all DB would need to be migrated if users are impacted by the limitation.
Updating the query code would be quite straightforward:
// current unsafe API
const query = datastore.createQuery('Product').filter('price', '<', 4.0);
// next safe API
const query = datastore.createQuery('Product').filter('price', '<', double(4.0));
However updating your entities would be much more of a pain:
{
price: 4.0
qty: 10,
id: 10,
sale: 0.50,
}
You basically have to wrap every single numeric field in int()
or double()
.
A few comments/ideas:
1) Instead of having to manually convert all entities in order to wrap numeric field, it might be good to specify a schema:
{
price: double
qty: int,
id: int,
sale: double,
}
The schema could be easily passed in datastore.save()
, same way the current API supports excludeFromIndexes
.
It might feel a little awkward to pass the schema to datastore.createQuery(...)
.
So maybe a good solution would be to create a Schema registry that you would init once ? Maybe as an option to new Datastore()
?
2) I still think a server side fix would be much easier on the users.
I am sorry not to have a good turnkey solution to propose right now. It would probably be good to investigate more deeply different solutions, their impact on the client code and their cost/feasibility on the API or server side.
I think the way forward is to start with exposing, in docs, the double() and int() representations, maybe not highlighting inference for numbers. I think this would be needed on save and query docs.
// current unsafe API
const query = datastore.createQuery('Product').filter('price', '<', 4.0);
// next safe API
const query = datastore.createQuery('Product').filter('price', '<', double(4.0));
Regarding
I still think a server side fix would be much easier on the users.
From a server standpoint the sort order is defined already. There is potential to add an 'OR' query that would essentially be a merge query. That said, even with that, the way inference is happening that wouldn't solve this interface problem on it's own, since that is really a double() or int()
We're considering whether an "OR" query can be added so I'm converting this to a feature request as one of the suggested outcomes of this.
1) Is this a client library issue or a product issue?
I guess a bug in the client library (
"@google-cloud/datastore": "^6.3.0"
)Environment details
@google-cloud/datastore
version: 6.3.0Steps to reproduce
1) Create an entity in the cloud console with an integer column, named
integer_column
and set its value to 200.2) Executes a query with a filter on that column:
Expected behavior:
The correct results are returned if
value
is a valid numerical value, i.e. 123 or 123.456Actual behavior
The correct results are only returned if value is 123. A float value (123.456) would return an empty result.
Workaround
Update the query to add
Math.round()
.Thanks!