keredson / peewee

a small, expressive orm -- supports postgresql, mysql and sqlite
http://docs.peewee-orm.com/
MIT License
13 stars 4 forks source link

Herman

According to Charles, the Peewee ORM is currently going through a major 3.0 rewrite and isn't accepting many (any?) new features for the 2.x line. Because of this, I made Herman as a light fork of Peewee (2.8.1) implementing some syntactic sugar and minor tweaks I've liked from other ORMs.

This is mostly for my company's internal use, but anyone else is free to use it. I'm maintaining API compatibility as a drop in replacement for peewee 2.x.

.. image:: https://api.travis-ci.org/keredson/peewee.png?branch=master :target: https://travis-ci.org/keredson/peewee

How To Install

.. code-block:: bash

sudo pip uninstall peewee sudo pip install herman

Or to upgrade:

.. code-block:: bash

sudo pip install herman --upgrade

There is no herman package. Herman represents itself as peewee (with the same peewee __version__) as a drop-in replacement. To check if you're running Peewee or Herman, check for peewee.__herman_version__.

Differences (Herman vs. Peewee)

SQL Generation

The SQL generated by an ORM isn't just an artifact to be consumed by the database. Developers need to be able to read it as well. The more readable it is and the closer it is to what one would naturally write, the less time it takes to manually evaluate / debug it.

For example, this query:

.. code-block:: python

Article.select()
.join(Person, join_type=pw.JOIN.LEFT_OUTER, on=(Article.author==Person.id))
.where(Article.author==derek)

In Peewee generates this:

.. code-block:: sql

SELECT "t1"."id", "t1"."title", "t1"."author_id", "t1"."editor_id" 
FROM "article" AS t1 
LEFT OUTER JOIN "person" AS t2 ON ("t1"."author_id" = "t2"."id") 
WHERE ("t1"."author_id" = ?

Whereas Herman generates:

.. code-block:: sql

SELECT a1.id, a1.title, a1.author_id, a1.editor_id 
FROM article AS a1 
LEFT OUTER JOIN person AS p1 ON (a1.author_id = p1.id) 
WHERE (a1.author_id = ?) 

You'll notice that in Herman:

The first() Method Uses a Limit

.. code-block:: python

Article.select().first()

In Peewee generates this:

.. code-block:: sql

SELECT "t1"."id", "t1"."title", "t1"."author_id", "t1"."editor_id" FROM "article" AS t1

Where in Herman this is generated:

.. code-block:: sql

SELECT a1.id, a1.title, a1.author_id, a1.editor_id FROM article AS a1 LIMIT 1

Notice the additional "LIMIT 1" in Herman. Peewee 2.8.1 rather loads entire result set and returns this first object. (It's my understanding future versions of Peewee will also add a "LIMIT 1".)

Named Table Aliases

In Peewee you can create table aliases as a variable outside your query. For example:

.. code-block:: python

author_table = Person.alias()
Article.select()
.join(author_table)
.where(author_table.name=='Derek')

This gets you a very respectable:

.. code-block:: sql

SELECT "t1"."id", "t1"."title", "t1"."author_id", "t1"."editor_id" 
FROM "article" AS t1
INNER JOIN "person" AS t2 ON ("t1"."author_id" = "t2"."id") 
WHERE ("t2"."name" = ?)

But in some queries it's very useful to specify what the aliases are. In Herman you can do this:

.. code-block:: python

author_table = Person.alias('author')
Article.select()
.join(author_table)
.where(author_table.name=='Derek')

Which generates this:

.. code-block:: sql

SELECT a1.id, a1.title, a1.author_id, a1.editor_id 
FROM article AS a1 
INNER JOIN person AS author ON (a1.author_id = author.id) 
WHERE (author."name" = ?)

These two statements are equivalent in Herman:

.. code-block:: python

author_table = Person.alias('author')
author_table = Person.as_('author')

And because aliases are named, you're no longer required to use the external variable:

.. code-block:: python

Article.select()
.join(Person.as_('author'))
.where(Person.as_('author').name=='Derek')

Whether in-lining the aliases makes it easier or harder to read is entirely dependent on the query and the code block it's in, but it's good to have the option. Both are supported in Herman.

Alias References are Inferred When Unambiguous

If a query is otherwise invalid and there is only one possible interpretation of the query author's intent, automatically tie the un-aliased column to the appropriate alias. For example:

.. code-block:: python

Person.alias('xyz').select().where(Person.name == 'Derek')

Generates invalid SQL in Peewee (which is run on the database, which throws an exception):

.. code-block:: sql

SELECT "t1"."id", "t1"."name", "t1"."parent_id"
FROM "person" AS t1 WHERE ("t2"."name" = ?)
peewee.OperationalError: no such column: t2.name

But in Herman it's valid:

.. code-block:: sql

SELECT xyz.id, xyz."name", xyz.parent_id 
FROM person AS xyz WHERE (xyz."name" = ?)

This would NOT work if more than one Person alias were included in the query.

Herman Raises Exceptions When Invalid Columns Are Referenced

Peewee will generate and run on the database SQL it knows is invalid. For example:

.. code-block:: python

Person.select().where(Article.title == 'xyz').first()

In Peewee will throw a database error:

.. code-block:: sql

SELECT "t1"."id", "t1"."name", "t1"."parent_id" 
FROM "person" AS t1 WHERE ("t2"."title" = ?)
peewee.OperationalError: no such column: t2.title

This isn't good for two reasons. First, I don't like relying on the database to catch easily detectable errors for us. Second, the error is opaque and specific to the internal implementation details of peewee (the "t2").

Herman on the other hand will not generate the SQL at all, instead raising:

.. code-block:: python

peewee.ProgrammingError: <class '__main__.Article'> is not a part of this query

The get() Method Confirms Uniqueness

The get() method in Peewee adds a "LIMIT 1" and returns the first object. I feel this is incorrect behavior. The difference between first() and get() is get() should assert that only one matching record exists. (This is something Django got right IMO.) If my query criteria hasn't correctly isolated a unique row the ORM should throw an exception.

This is why Herman added a "LIMIT 2" to the SQL genned from get(), and does a check on the number of object returned. For example:

.. code-block:: python

derek = Person.create(name='Derek')
callie = Person.create(name='Callie')
Person.select().get()

Will throw:

.. code-block:: sql

peewee.DataError: Too many instances matching query exist:
SQL: SELECT p1.id, p1."name", p1.parent_id FROM person AS p1 LIMIT 2

Rather than returning a random Person object selected by the database.

The Shortcut ALL

In Herman, this:

.. code-block:: python

Person.ALL

Is the same as this:

.. code-block:: python

Person.select()

It's just a nomenclature I preferred from Django. I made it uppercase to prevent conflicts with any columns named "all", and to highlight that it's effectively a constant.

A New (Additional) Join Syntax

If I have a set of models:

.. code-block:: sql

class Person(pw.Model):
  name = pw.CharField()

class Article(pw.Model):
  title = pw.CharField()
  author = pw.ForeignKeyField(db_column='author_id', rel_model=Person, to_field='id')
  editor = pw.ForeignKeyField(db_column='editor_id', rel_model=Person, to_field='id', related_name='edited_articles', null=True)

class Reply(pw.Model):
  text = pw.CharField()
  article = pw.ForeignKeyField(db_column='article_id', rel_model=Article, to_field='id')

And I want to do something fancy like get all replies with their articles and authors and editors, in Peewee I have to do something like this:

.. code-block:: python

author_table = Person.alias() editor_table = Person.alias() replies = Reply.select(Reply, author_table, editor_table) .join(Article) .join(author_table, join_type=pw.JOIN.LEFT_OUTER, on=(author_table==Article.author)) .switch(Article) .join(editor_table, join_type=pw.JOIN.LEFT_OUTER, on=(editor_table==Article.editor)) .where(author_table.name=="Derek")

Which is all sorts of complicated. For instance:

Herman offers a simpler syntax:

.. code-block:: python

Reply.ALL .plus(Reply.article, Article.author.as('author')) .plus(Reply.article, Article.editor) .where(Person.as('author').name=="Derek")

Which generates the same SQL:

.. code-block:: sql

SELECT r1.id, r1."text", r1.article_id, a1.id, a1.title, a1.author_id, a1.editor_id, author.id, author."name", author.parent_id, p1.id, p1."name", p1.parent_id FROM reply AS r1 LEFT OUTER JOIN article AS a1 ON (r1.article_id = a1.id) LEFT OUTER JOIN person AS author ON (a1.author_id = author.id) LEFT OUTER JOIN person AS p1 ON (a1.editor_id = p1.id) WHERE (author."name" = ?)

The plus() method takes a variable number of ForeignKeyField objects which represent a path away from the primary query object (Reply in this case). For example:

.. code-block:: python

Reply.ALL.plus(Reply.article)

Gets all the replies with their associated articles.

.. code-block:: python

Reply.ALL.plus(Reply.article, Article.editor)

Gets all the replies with their associated articles and all the articles editors. Note that "Reply.article" is a foreign key from Reply to Article, and Article.editor is a foreign key from Article to Person. The list of foreign keys must create a logical path where the "to" type of one FK matches the "from" type of the next FK.

Following the same path twice is harmless. For instance:

.. code-block:: python

Reply.ALL.plus(Reply.article).plus(Reply.article)

Is no different than calling plus() once. This is why we can specify multiple paths that have some overlap, for example:

.. code-block:: python

Reply.ALL .plus(Reply.article, Article.author) .plus(Reply.article, Article.editor)

Which will join on the article table only once.

You can also alias your joined tables with:

.. code-block:: python

Article.author.as_('author')

Which allows you to reference it later in your conditional:

.. code-block:: python

.where(Person.as_('author').name=="Derek")

Herman's plus() also supports following foreign keys from one-to-many. Like:

.. code-block:: python

Article.ALL.plus(Reply.article)

Which internally does a prefetch to populate the article with all of its replies. There will be O(k) SQL statements executed, where k is the number of two-many relationships. All of these queries will be grouped into one transaction to guarantee correctness.

IMPORTANT:

Remember that foreign keys represent edges in your object graph, and a call to `plus(*edges)` tells Herman to include that path from the object graph in your query.

This semantics for plus() have been co-opted from the DKO <https://github.com/keredson/DKO>_ project, which I authored for my former employer. DKO's version of this syntax has been in broad production use since 2010 by hundreds of developers, accessing some of the largest (billions of rows) conventional relational databases that exist.

Calling len() Does a Database Count

If you call:

.. code-block:: python

len(Article.select())

Peewee this will load a list of all objects, permanently cache said list, and then call len() on that cache.

In Herman this will call count() on the database and return the resulting integer. It does not build the list of objects in python nor cache anything. However, for backwards compatability, if something else has already populated the cached results of the query, it will call len() on that.

A New DeferredRelation Syntax

The semantics behind Peewee's circular foreign key dependencies <http://docs.peewee-orm.com/en/latest/peewee/model.html#circular-foreign-key-dependencies>_ get kind of unwieldy when you have more than a few models (and they're spread over multiple files). This is because the DeferredRelation object has to be defined, used, then the other model defined in another file, then set_model has to be called on the original, and then you're left with the object reference dangling around that has no purpose. IE the example in the docs:

.. code-block:: python

Create a reference object to stand in for our as-yet-undefined Tweet model.

DeferredTweet = DeferredRelation()

class User(Model): username = CharField()

Tweet has not been defined yet so use the deferred reference.

favorite_tweet = ForeignKeyField(DeferredTweet, null=True)

class Tweet(Model): message = TextField() user = ForeignKeyField(User, related_name='tweets')

Now that Tweet is defined, we can initialize the reference.

DeferredTweet.set_model(Tweet)

Ours happens all in the model definition with an optional parameter given to DeferredRelation. Like:

.. code-block:: python

class User(Model): username = CharField()

Tweet has not been defined yet so use the deferred reference.

favorite_tweet = ForeignKeyField(DeferredRelation('Tweet'), null=True)

class Tweet(Model): message = TextField() user = ForeignKeyField(User, related_name='tweets')

This removes the need for the extra variable in the global namespace and the coordination of it over multiple files. And since the parameter is optional, it is fully backwards-compatible with the old syntax.

Our patch for this has been incorporated upstream, so this is forwards-compatible too, following Peewee's next release.

Passing an Empty List/Set/Tuple into IN Doesn't Gen Invalid SQL

If you try to do a IN operation on an empty list:

.. code-block:: python

User.select().where(User.id << [])

Peewee will generate the following SQL:

.. code-block:: sql

SELECT "t1"."id", "t1"."username" FROM "user" AS t1 WHERE ("t1"."id" IN ())

Which the database will reject as invalid, throwing an exception. We instead generate a "false" statement:

.. code-block:: sql

SELECT u1.id, u1.username FROM "user" AS u1 WHERE (0 = 1)

So you don't have to manually test for empty lists every time you use a SQL IN.

peewee

Peewee is a simple and small ORM. It has few (but expressive) concepts, making it easy to learn and intuitive to use.

New to peewee? Here is a list of documents you might find most helpful when getting started:

For flask helpers, check out the flask_utils extension module <http://docs.peewee-orm.com/en/latest/peewee/playhouse.html#flask-utils>. You can also use peewee with the popular extension flask-admin <https://flask-admin.readthedocs.io/en/latest/> to provide a Django-like admin interface for managing peewee models.

Examples

Defining models is similar to Django or SQLAlchemy:

.. code-block:: python

from peewee import *
from playhouse.sqlite_ext import SqliteExtDatabase
import datetime

db = SqliteExtDatabase('my_database.db')

class BaseModel(Model):
    class Meta:
        database = db

class User(BaseModel):
    username = CharField(unique=True)

class Tweet(BaseModel):
    user = ForeignKeyField(User, related_name='tweets')
    message = TextField()
    created_date = DateTimeField(default=datetime.datetime.now)
    is_published = BooleanField(default=True)

Connect to the database and create tables:

.. code-block:: python

db.connect()
db.create_tables([User, Tweet])

Create a few rows:

.. code-block:: python

charlie = User.create(username='charlie')
huey = User(username='huey')
huey.save()

# No need to set `is_published` or `created_date` since they
# will just use the default values we specified.
Tweet.create(user=charlie, message='My first tweet')

Queries are expressive and composable:

.. code-block:: python

# A simple query selecting a user.
User.get(User.username == 'charles')

# Get tweets created by one of several users. The "<<" operator
# corresponds to the SQL "IN" operator.
usernames = ['charlie', 'huey', 'mickey']
users = User.select().where(User.username << usernames)
tweets = Tweet.select().where(Tweet.user << users)

# We could accomplish the same using a JOIN:
tweets = (Tweet
          .select()
          .join(User)
          .where(User.username << usernames))

# How many tweets were published today?
tweets_today = (Tweet
                .select()
                .where(
                    (Tweet.created_date >= datetime.date.today()) &
                    (Tweet.is_published == True))
                .count())

# Paginate the user table and show me page 3 (users 41-60).
User.select().order_by(User.username).paginate(3, 20)

# Order users by the number of tweets they've created:
tweet_ct = fn.Count(Tweet.id)
users = (User
         .select(User, tweet_ct.alias('ct'))
         .join(Tweet, JOIN.LEFT_OUTER)
         .group_by(User)
         .order_by(tweet_ct.desc()))

# Do an atomic update
Counter.update(count=Counter.count + 1).where(
    Counter.url == request.url)

Check out the example app <http://docs.peewee-orm.com/en/latest/peewee/example.html>_ for a working Twitter-clone website written with Flask.

Learning more

Check the documentation <http://docs.peewee-orm.com/>_ for more examples.

Specific question? Come hang out in the #peewee channel on irc.freenode.net, or post to the mailing list, http://groups.google.com/group/peewee-orm . If you would like to report a bug, create a new issue <https://github.com/coleifer/peewee/issues/new>_ on GitHub.

Still want more info?

.. image:: http://media.charlesleifer.com/blog/photos/wat.jpg

I've written a number of blog posts about building applications and web-services with peewee (and usually Flask). If you'd like to see some real-life applications that use peewee, the following resources may be useful: