prooph / pdo-event-store

PDO implementation of ProophEventStore http://getprooph.org
BSD 3-Clause "New" or "Revised" License
111 stars 56 forks source link

Postgres with metadatamatcher #199

Closed basz closed 5 years ago

basz commented 5 years ago

With around 250000 events in a Postgres Simple Strategy table the snapshotter becomes slow.

I've investigated this a bit and it seems the snapshotter loads from the store with a metadatamatcher to create queries like;

SELECT * FROM "_7794a224f85453058cf637d52e7c4487568a8271"
    WHERE metadata->>'_aggregate_type' = 'HF\Api\Domain\Dossier\Aggregate\Dossier' 
    AND metadata->>'_aggregate_id' = 'dd890c7e-f9d6-5262-9e89-a16457a624a7'  AND CAST(metadata->>'_aggregate_version' AS INT) > 1  AND no >= 0 
    ORDER BY no ASC
    LIMIT 1000;

EXPLAIN

  ->  Sort  (cost=22027.82..22027.83 rows=2 width=470)
        Sort Key: no
        ->  Seq Scan on _7794a224f85453058cf637d52e7c4487568a8271  (cost=0.00..22027.81 rows=2 width=470)
              Filter: ((no >= 0) AND ((metadata ->> '_aggregate_type'::text) = 'HF\Api\Domain\Dossier\Aggregate\Dossier'::text) AND ((metadata ->> '_aggregate_id'::text) = 'dd890c7e-f9d6-5262-9e89-a16457a624a7'::text) AND (((metadata ->> '_aggregate_version'::text))::integer > 1))

These queries take around ~500milliseconds to complete, which is slowish to me.

So, I read about the GIN index which is indented for jsonb columns.

Adding an index

CREATE INDEX idxgin ON _7794a224f85453058cf637d52e7c4487568a8271 USING gin (metadata);

And modifiyng the query to use this index (only possible with some operators, hence only on metadata._aggregate_id) changes the select time to about ~175 milliseconds.

SELECT * FROM "_7794a224f85453058cf637d52e7c4487568a8271"
    WHERE metadata->>'_aggregate_type' = 'HF\Api\Domain\Dossier\Aggregate\Dossier' 
    AND metadata @> '{"_aggregate_id" : "dd890c7e-f9d6-5262-9e89-a16457a624a7"}'  AND CAST(metadata->>'_aggregate_version' AS INT) > 1  AND no >= 0 
    ORDER BY no ASC
    LIMIT 1000;

EXPLAIN

Limit  (cost=21448.76..21448.76 rows=1 width=470)
  ->  Sort  (cost=21448.76..21448.76 rows=1 width=470)
        Sort Key: no
        ->  Seq Scan on _7794a224f85453058cf637d52e7c4487568a8271  (cost=0.00..21448.75 rows=1 width=470)
              Filter: ((metadata @> '{"_aggregate_id": "dd890c7e-f9d6-5262-9e89-a16457a624a7"}'::jsonb) AND (no >= 0) AND ((metadata ->> '_aggregate_type'::text) = 'HF\Api\Domain\Dossier\Aggregate\Dossier'::text) AND (((metadata ->> '_aggregate_version'::text))::integer > 1))

As you may noticed from the explain filtering on aggregate_id appears before the no.

Worth changing?

basz commented 5 years ago

reference: https://bitnine.net/blog-postgresql/postgresql-internals-jsonb-type-and-its-indexes/

prolic commented 5 years ago

You are using the wrong persistence strategy. You should use SingleStreamStrategy instead.

basz commented 5 years ago

Sure. Did not realize this. Any advice on how to merge multiple streams into one?

prolic commented 5 years ago

If you have one stream per aggregate type, you don't need to change anything. It can very well be a single stream for "user" and another single stream for "blogpost". You should be able to simply change the configuration of your application and migrate the database to include those missing indexes.

basz commented 5 years ago

I have switched. Needed to fix the _aggregate_version which were not incremental in a few cases (oops).

Queries seem a bit faster, but still slowish.

Still think a gin index for selecting by metadata @> '{"_aggregate_id" : "dd890c7e-f9d6-5262-9e89-a16457a624a7"}' would increase performance.

SELECT * FROM "_7794a224f85453058cf637d52e7c4487568a8271"
WHERE metadata->>'_aggregate_type' = 'HF\Api\Domain\Dossier\Aggregate\Dossier' 
  AND metadata->>'_aggregate_id' = 'dd890c7e-f9d6-5262-9e89-a16457a624a7'
  AND CAST(metadata->>'_aggregate_version' AS INT) > 1
  AND no >= 0 
ORDER BY no ASC
LIMIT 1000;

Time: 561.407 ms 🤒

Using the GIN index

SELECT * FROM "_7794a224f85453058cf637d52e7c4487568a8271"
WHERE metadata->>'_aggregate_type' = 'HF\Api\Domain\Dossier\Aggregate\Dossier' 
  AND metadata @> '{"_aggregate_id" : "dd890c7e-f9d6-5262-9e89-a16457a624a7"}' 
  AND CAST(metadata->>'_aggregate_version' AS INT) > 1  
  AND no >= 0 
ORDER BY no ASC
LIMIT 1000;

Time: 1.463 ms 😮

I realize only the @> can be used with the GIN index. The MetadataMatcher query should only work with @> for the Operator::EQUALS operator.

prolic commented 5 years ago

@basz anything we need to do here?

basz commented 5 years ago

Forgot about this one. I have resolved it. a matter of not having the correct indexes and being a panicky :)