upper / db

Data Access Layer (DAL) for PostgreSQL, CockroachDB, MySQL, SQLite and MongoDB with ORM-like features.
https://upper.io/
MIT License
3.53k stars 234 forks source link

Performance: collections should be shared between transactions #445

Open dardoguidobono opened 6 years ago

dardoguidobono commented 6 years ago

Hello: I create several transactions per second and after that use the Collection method of the tx. that triggers the PrimaryKey() and the sql to this method every time..

Can we just cache the collection Structs ?

Code snippet:

    db.SetLogging(true)

    db.Collection("twitter_analysis")
    db.Collection("twitter_analysis")
    db.Collection("twitter_analysis")
    tx , _ := db.NewTx(context.Background())
    tx.Collection("twitter_analysis")

    tx2 , _ := db.NewTx(context.Background())
    tx2.Collection("twitter_analysis")

That outputs:

2018/03/23 15:02:07 Session ID: 00001 Query: SELECT "pg_attribute"."attname" AS "pkey" FROM "pg_index", "pg_class", "pg_attribute" WHERE (pg_class.oid = '"twitter_analysis"'::regclass AND indrelid = pg_class.oid AND pg_attribute.attrelid = pg_class.oid AND pg_attribute.attnum = ANY(pg_index.indkey) AND indisprimary) ORDER BY "pkey" ASC Time taken: 0.00359s Context: context.Background

2018/03/23 15:02:07 Session ID: 00002 Transaction ID: 00001 Query: SELECT "pg_attribute"."attname" AS "pkey" FROM "pg_index", "pg_class", "pg_attribute" WHERE (pg_class.oid = '"twitter_analysis"'::regclass AND indrelid = pg_class.oid AND pg_attribute.attrelid = pg_class.oid AND pg_attribute.attnum = ANY(pg_index.indkey) AND indisprimary) ORDER BY "pkey" ASC Time taken: 0.00162s Context: context.Background

2018/03/23 15:02:07 Session ID: 00003 Transaction ID: 00002 Query: SELECT "pg_attribute"."attname" AS "pkey" FROM "pg_index", "pg_class", "pg_attribute" WHERE (pg_class.oid = '"twitter_analysis"'::regclass AND indrelid = pg_class.oid AND pg_attribute.attrelid = pg_class.oid AND pg_attribute.attnum = ANY(pg_index.indkey) AND indisprimary) ORDER BY "pkey" ASC Time taken: 0.00635s Context: context.Background

maybe by adding : nd.cachedCollections = d.cachedCollections inside NewClone function of database?

// NewClone binds a clone that is linked to the current
// session. This is commonly done before creating a transaction
// session.
func (d *database) NewClone(p PartialDatabase, checkConn bool) (BaseDatabase, error) {
    nd := NewBaseDatabase(p).(*database)

    nd.name = d.name
    nd.sess = d.sess
    nd.cachedCollections = d.cachedCollections

Also the the caching to be effective needs a close change in the close method so it only clear the cache of collections if the database if there is no tx

// Close terminates the current database session
func (d *database) Close() error {
    defer func() {
        d.sessMu.Lock()
        d.sess = nil
        d.baseTx = nil
        d.sessMu.Unlock()
    }()
    if d.sess != nil {
        if cleaner, ok := d.PartialDatabase.(hasCleanUp); ok {
            cleaner.CleanUp()
        }

        d.cachedStatements.Clear() // Closes prepared statements as well.

        tx := d.Transaction()
        if tx == nil {
            // Not within a transaction.
                d.cachedCollections.Clear()
            return d.sess.Close()
        }

        if !tx.Committed() {
            tx.Rollback()
            return nil
        }
    }
    return nil
}
xiam commented 6 years ago

We used to cache collection columns, but that got us in trouble and we decided to remove that feature.

Imagine this scenario:

  1. upper does some queries and caches table keys.
  2. You run an ALTER TABLE query directly into your database, this modifies the primary key in some table. upper doesn't know about this.
  3. upper tries to use the cached keys (which are now invalid because you altered the table), and produces invalid SQL.

I think this is important, but we don't have a solution for the above case yet. What do you think?

By the way, this is not a cache, but if you keep a reference to a collection, like this:

col := sess.Collection("table")

Then you'll be able to use it like:

col.Find()...

col.Insert()...

without having to find the table primary keys again.

Let's keep this discussion open, would be great to see different options (and we'll hopefully find a solution).

davars commented 6 years ago

In other environments (thinking of rails and sqlalchemy here), migrations are run out-of-band from the client application, and the assumption is that the client will either be down during migrations or the migrations will be compatible with running apps, which will be rolling-restarted soon after the migrations run. Sacrificing everyday performance for safety during deployments doesn't seem like a great tradeoff.

That said, I'm also not a fan of the hidden caching and do like making collection reuse up to the application rather than managed by upper/db. In other words, I like the col.Find() / col.InsertReturning() approach. Unfortunately there's currently no way I can see to perform multiple collection operations inside a transaction.

A couple rough first thoughts:

  1. Add a Tx method to Collection, similar to the one in sqlbuilder
  2. Provide the ability to set collection metadata without querying it from the database (this could be from a stored collection created outside of the transaction, or, with the addition of a serialization format for the metadata, even loaded from disk / memcache so that only one process needs to hit the DB for the metadata).
dardoguidobono commented 6 years ago

well with my proposal the collections are cached in the session not in the tx and are only queried once. if i apply the patch i dont see any further queries using the same session.

D.

tietang commented 5 years ago

when clone a database:

  1. A new connection needs to be established,connection pooling cannot be used?Or a default connection pool may be setup
  2. The cache cachedCollections cannot be reused

github.com/upper/db/mysql/database.go

line:219

func (d *database) NewDatabaseTx(ctx context.Context) (sqladapter.DatabaseTx, error) {
    clone, err := d.clone(ctx, true)
    if err != nil {
        return nil, err
    }
    clone.mu.Lock()
    defer clone.mu.Unlock()

    connFn := func() error {
        sqlTx, err := compat.BeginTx(clone.BaseDatabase.Session(), ctx, clone.TxOptions())
        if err == nil {
            return clone.BindTx(ctx, sqlTx)
        }
        return err
    }

    if err := d.BaseDatabase.WaitForConnection(connFn); err != nil {
        return nil, err
    }

    return sqladapter.NewDatabaseTx(clone), nil
}