Closed jgeewax closed 9 years ago
@jgeewax We spent a lot of work to get rid of the high-level concept of dataset. The philosophy was that we should make the objects and their constructors easier to use and that "learning" another class was overkill when the dataset was a) just a string and b) something that was a single-valued constant for most users.
That's going to be a really big problem when Datastore exposes multiple datasets per project...
dataset
module, which is api
. I think it's better to construct a dataset
from datastore
and leave datastore
for meta-methods like set_defaults()
now. Entity
. It inevitably ends up moving towards ORM which inevitably makes the library more and more heavyweight, and duplicates use-cases from NDB and DB. If anything I think the easiest way forward is to simplify entity. The most lightweight syntax I can think of is:SETUP
datastore.set_defaults() //does auth and connection stuff
d = datastore.Dataset('dataset-id', **kwargs) //more advanced options like read_options should be put here since it is a one time operation
Put a single entity
key = d.put(Key(*path), Entity(**props))
Put a bunch of entities
d.put({key1: entity1, key2: entity2 ...})
Get an entity
d.get(Key(*path))
A couple comments about "what I'd like to write":
entity.put()
is confusing. Where is the entity going? Is put really functionality of an entity, or functionality of a dataset on an entity and key. Why is ent = dataset.Entity(key)
ent.update(props)
ent.put()
preferable to
ent = Entity(props)
dataset.put(key, ent)
I am completely boggled by this.
datastore.Kind('kind').getById(123)
is really really scary. We shouldn't be encouraging people to think of this as relational database. It isn't. We also shouldn't be encouraging them to make 'kind' queries when they know the id, not only is it confusing, it's inefficient. Finally, as an aside, I submitted a PR to remove the list wrapping requirement for single arguments #623 , but if you want to support python 27 and 26 then you need to use **kwargs
and a parser rather than explicit named arguments. Dunno if it's worth it. Python 3 supports def func(arg, *args, kw=value, **kwargs)
syntax.
@jgeewax just wanted to add that I agree with the premise of this issue. I'm writing the samples for the landing page as we speak, and it's awkward to say the least.
.put()
"It's similar to the ext.db
API. This is a matter of preference. See https://cloud.google.com/appengine/docs/python/datastore/modelclass#Model_put
profile = datastore.Entity('Profile').get_by_id(123)
profile['likes_fruit'] = False
profile.put()
.get_by_id()
"It's a helper-method for ultimately constructing a key which reads more clearly (to me). It is also similar to ext.db
which provides a get_by_id
method. Again, this is a matter of preference.
profile = datastore.Entity('Profile').get_by_id(1234)
# is the same as
profile = datastore.get(Key('Profile', 1234))
We shouldn't be encouraging people to think of this as relational database. It isn't.
Nowhere here are we talking about relational data. get_by_id
is a shortcut for get(Key(<kind already known>, id))
. See https://cloud.google.com/appengine/docs/python/datastore/modelclass#Model_get_by_id
We also shouldn't be encouraging them to make 'kind' queries when they know the id, not only is it confusing, it's inefficient.
I don't see a query here, nor do I see anything related to efficiency. The code I'm talking about would likely look like...
class Entity(object):
...
def get_by_id(self, id):
return datastore.get(Key(self.kind, id))
I thought the goal was to move away from the DB implementations, not take queues from their implementation. If the goal is to cover different use cases the syntax is going to be different.
RE: kind(foo).get_by_id(bar)
I mistakenly assumed from your comment
Equivalent of "drilling down" into a "table" (kind), and then a "row" (id)
that this was how you meant it, and then it would be implemented by pulling the entities of a certain kind and searching for the right id.
EDIT: If you really wanted to add extra sugar on top of key, you could do something like
dataset.key(parent=parent_key, kind='kind', id='id')
I don't recall ever saying or hearing "we're going to avoid everything done in ext.db
"...?
I think the .put()
style on an Entity is a nice thing. I don't have to know anything about the connection, dataset, or authentication -- and .put()
will just "save the entity the way it was retrieved". This might seem obvious now, but when multiple datasets are a thing, and different "connections" are floating around, it might be important. It means an Entity on it's own can be enough to persist data.
I could be wrong here. Either way, I don't think we should take a hard-line stance of "no stuff that was in ext.db
"...
I think that is going to be more of a trouble than a convenience TBH. To quote Zen of Python explicit is better than implicit
.
Before the major restructuring Entity
had all the necessary information (dataset, key, etc) to have all this functionality on it's own, and usage was much more cumbersome IMO. It was the point where I would rather just have used NDB, since it felt like I was being forced to do all the ORM stuff anyway.
@jgeewax Can we close this out?
The core issues were
Both of those have been implemented.
Anything remaining can be broken off into a smaller more digestible bug for discussion?
SGTM
Hi guys,
I was reviewing the "state of the world" of the datastore API, and found a few things that seem (to me at least) a bit more complicated than they should be.
A bit of this comes from the authentication stuff (which we have another open bug for) but I wanted to bring up a few things for discussion... If this issue gets out of control, we can move the discussion elsewhere, but I wanted to kick things off:
Create a new Person entity in the Datastore
What I have to write
What I'd like to write
Key differences
set_defaults()
(there's another issue open for this I believe)datastore.Kind('Person').get_entity()
?).put
not requiring a list for a single entityGet an existing Person (123) from the Datastore
What I have to write
What I'd like to write
Key differences
set_defaults
(Not hugely concerned about this use case, but wanted to toss it out)
Create a new Person using a specific set of credentials (
/home/creds.json
) to talk to a specific dataset ('my-dataset'
) in the Datastore(Being looked into in PR https://github.com/GoogleCloudPlatform/gcloud-python/pull/641)
What I have to write
_(This is a best guess... I don't know if this code is even right... And even so, this code won't do the same thing if you ran it on App Engine because of the order of evaluation here: http://googlecloudplatform.github.io/gcloud-python/latest/gcloud-api.html#gcloud.credentials.get_credentials )_
What I'd like to write
Key differences
datastore
would/cc @tseaver @dhermes @pcostell