Open jdposthuma opened 3 years ago
Yes I thought about this and it is certainly possible. Although large IN clause's have limitations of their own. I will assess how to go about this.
The IN
clause doesn't affect performance either way in this case, just syntax preference.
I took a look at this today and was completely lost trying get sqlalchemy to form a LEFT JOIN LATERAL
, which is a neat feature of my proposed query. Hopefully, this won't be a big deal for you.
I assume maintaining this project isn't your full-time job, so no pressure, but do you have an estimate on when this is achievable by?
Also, I almost forgot - here's the keys
query I'm using. Hopefully sqlalchemy can do window functions easily:
select jsonb_build_object(
'message',
jsonb_build_object(
'id',
JSON_AGG(message_1.id) filter (
where message_1.id IS NOT NULL
)
),
'user',
jsonb_build_object(
'id',
JSON_AGG(user_1.id) filter (
where user_1.id IS NOT NULL
)
)
) as _meta,
row_to_json(conversation_1_out)::jsonb || jsonb_build_object(
'messages',
JSON_AGG(
row_to_json(message_1_out)::jsonb || jsonb_build_object('user', row_to_json(user_1_out))
)
) as obj,
conversation_1.id as id
from staging.conversation conversation_1
...
That query might not be generic enough. Amongst many reasons, pgsync supports compound foreign and primary keys. So this would break that functionality. I have added an optimization strategy to the master branch for your use case. Can you try and let me know the verdict?
To clarify - the use of the IN
query did not significantly affect performance in my scenario.
The biggest performance boost came from the use of joins in the main query (not subqueries). Therefore, this commit doesn't significantly affect performance.
So this did not help in your case but it does provide some level of optimization in certain cases.. I'm trying to find a window to explore your query a bit more.
@toluaina Have you had a chance to look into this? I have a schema very similar to that of @jdposthuma and returning a mere 5 results can take up to a 60 seconds.
In another instance, the query took 53 minutes to return a single record.
Will take another look at this
Can either of you provide a more concrete/complete working example, please? I have so far failed to reproduce this myself Additionally, jdposthuma's idea is good but non-trivial to generalize. Although, I believe it can be done. I noticed that jdposthuma is using explicit foreign keys. This usually means there is no foreign key relationship. Are there appropriate indexes defined that could be contributing to the inefficiency?
@voyc-jean Do you have a minimal working example to examine in isolation? i.e both pgsync schema.json and database schema
@toluaina
Things only really start becoming problematic when reaching tens of millions of records in the joined tables. The schema looks like this:
- book [500k rows]
- paragraph [3.7 mil rows]
- sentence [75 mil rows]
The initial sync runs in a reasonable amount of time. The continuous sync with daemon mode, however, is where things get really slow.
If I make a change to a row (id 1) in the sentence
table, the query that PGSync generates looks like this:
SELECT JSON_BUILD_ARRAY(anon_1._keys) AS "JSON_BUILD_ARRAY_1",
JSON_BUILD_OBJECT('id', book_1.id, 'title', book_1.title, 'description', book_1.description, 'paragraph', anon_1.paragraph) AS "JSON_BUILD_OBJECT_1",
book_1.id
FROM public.book AS book_1
LEFT OUTER JOIN
(SELECT CAST(JSON_BUILD_OBJECT('paragraph', JSON_AGG(JSON_BUILD_OBJECT('id', JSON_BUILD_ARRAY(paragraph_1.id)))) AS JSONB) || CAST(JSON_AGG(anon_2._keys) AS JSONB) AS _keys,
JSON_AGG(JSON_BUILD_OBJECT('id', paragraph_1.id, 'sentence', anon_2.sentence)) AS paragraph,
paragraph_1.book_id AS book_id
FROM public.paragraph AS paragraph_1
JOIN
(SELECT CAST(JSON_BUILD_OBJECT('sentence', JSON_AGG(JSON_BUILD_OBJECT('id', sentence_1.id))) AS JSONB) AS _keys,
JSON_AGG(JSON_BUILD_OBJECT('id', sentence_1.id, 'text', sentence_1.text)) AS sentence,
sentence_1.paragraph_id AS paragraph_id
FROM public.sentence AS sentence_1
GROUP BY sentence_1.paragraph_id) AS anon_2 ON anon_2.paragraph_id = paragraph_1.id
GROUP BY paragraph_1.book_id) AS anon_1 ON anon_1.book_id = book_1.id
WHERE book_1.id = '1'
This query takes 101 seconds to execute and returns one result. The explain analyze looks like this.
To replicate this I've had to write a script which populates the database with millions of records. Please see the example in my fork here. The script can be run similarly to the other examples: ./examples/bootstrap.sh book_test
It might obviously take some time for the records to generate as it should generate 500k book
rows and upwards of 70 million sentence
rows.
On a side note, I believe that this might also be the cause of #211, #130 and #88
LATERAL
is a very good idea and I did some tech analysis but it requires a but of re-engineering to query builder - which would require a lot of effort.--analyze
to the master branch which shows you the missing indices/how to create them,
pgsync -c schema.json --analyze
Hi @toluaina
Thanks for the detailed reply and for adding --analyze
. You are right about the index, I'm used to the Django ORM which automatically creates foreign key indices.
However, even with the indices added, the query still takes upward of 100s to complete. Please see the new explain.
As per the ouput of --analyze
, the indices exist:
Found index "paragraph_book_id_idx" for table "paragraph" for columns: ['book_id']: OK
Found index "sentence_paragraph_id_idx" for table "sentence" for columns: ['paragraph_id']: OK
Unless I am still missing something, it doesn't seem like the indices are the main problem?
To add to the above, if I make use of the query from @jdposthuma, the result returns in 16ms.
Hello @voyc-jean . Can you please check this again? Once I created, the index the updates are near-instantaneous. Can you run this query and time how long it takes?
SELECT JSON_BUILD_ARRAY(anon_1._keys) AS "JSON_BUILD_ARRAY_1",
CAST(JSON_BUILD_OBJECT('id', book_1.id, 'title', book_1.title, 'description', book_1.description, 'paragraph', anon_1.paragraph) AS JSONB) AS "JSON_BUILD_OBJECT_1",
book_1.id
FROM public.book AS book_1
LEFT OUTER JOIN
(
SELECT CAST(JSON_BUILD_OBJECT('paragraph', JSON_AGG(CAST(JSON_BUILD_OBJECT('id', JSON_BUILD_ARRAY(paragraph_1.id)) AS JSONB))) AS JSONB) || CAST(JSON_AGG(anon_2._keys) AS JSONB) AS _keys,
JSON_AGG(CAST(JSON_BUILD_OBJECT('id', paragraph_1.id, 'sentence', anon_2.sentence) AS JSONB)) AS paragraph,
paragraph_1.book_id AS book_id
FROM public.paragraph AS paragraph_1
JOIN
(
SELECT CAST(JSON_BUILD_OBJECT('sentence', JSON_AGG(CAST(JSON_BUILD_OBJECT('id', sentence_1.id) AS JSONB))) AS JSONB) AS _keys,
JSON_AGG(CAST(JSON_BUILD_OBJECT('id', sentence_1.id, 'text', sentence_1.text) AS JSONB)) AS sentence,
sentence_1.paragraph_id AS paragraph_id
FROM public.sentence AS sentence_1
GROUP BY sentence_1.paragraph_id
) AS anon_2 ON anon_2.paragraph_id = paragraph_1.id
where paragraph_1.book_id = 1
GROUP BY paragraph_1.book_id
) AS anon_1 ON anon_1.book_id = book_1.id
WHERE book_1.id = '1'
@toluaina Your query executed in 375 seconds. Here is the explain. Note that Index scans area being used.
Are you certain that you're testing with roughly the same amount of records I mentioned previously? With smaller tables and fewer nested queries, this issue does not seem to occur.
@voyc-jean I made some changes to the master branch that should hopefully improve proformance. Can you please try that and let me know.
Hi there @toluaina,
Apologies for the long delay, I've had to find some time to look further into this.
The good news is that the change you made seems to have had a huge positive effect on performance. My former query that took upwards of 100s to return now returns in ~700ms, a great improvement!
The bad news is that this has lead to another issue; After the initial sync is done (after about 8 hours), the following query runs:
SELECT xid,
DATA
FROM PG_LOGICAL_SLOT_PEEK_CHANGES('database_index', NULL, NULL)
WHERE CAST(CAST(xid AS TEXT) AS BIGINT) >= 152020106
AND CAST(CAST(xid AS TEXT) AS BIGINT) < 152306901
Once complete after about 8 minutes, the data waiting in the replication slot is synced and and the checkpoint is updated. A new query then runs:
SELECT JSON_BUILD_ARRAY(anon_1._keys, anon_2._keys, anon_3._keys, anon_4._keys, anon_5._keys, anon_6._keys, anon_7._keys, anon_8._keys, anon_9._keys) AS "JSON_BUILD_ARRAY_1",
CAST(JSON_BUILD_OBJECT('id', voycapi_conversations_1.id, 'name', voycapi_conversations_1.name, 'project_id', voycapi_conversations_1.project_id, 'conversation_date', voycapi_conversations_1.conversation_date, 'conversation_time', voycapi_conversations_1.conversation_time, 'createdate', voycapi_conversations_1.createdate, 'status', voycapi_conversations_1.status, 'note', voycapi_conversations_1.note, 'audiolength', voycapi_conversations_1.audiolength, 'agents', anon_1.agents, 'attribute', anon_2.attribute, 'grading', anon_3.grading, 'hold_time', anon_4.hold_time, 'review', anon_5.review, 'score', anon_6.score, 'tags', anon_7.tags, 'transcript', anon_8.transcript, 'workflow_session', anon_9.workflow_session) AS JSONB) AS "JSON_BUILD_OBJECT_1",
voycapi_conversations_1.id
FROM public.voycapi_conversations AS voycapi_conversations_1
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('voyc_agent_agentconversation', JSON_AGG(CAST(JSON_BUILD_OBJECT('id', JSON_BUILD_ARRAY(voyc_agent_agentconversation_1.id)) AS JSONB))) AS JSONB) || CAST(JSON_AGG(anon_10._keys) AS JSONB) AS _keys,
JSON_AGG(CAST(JSON_BUILD_OBJECT('id', voyc_agent_agentconversation_1.id, 'agent', anon_10.agent) AS JSONB)) AS agents,
voyc_agent_agentconversation_1.conversation_id AS conversation_id
FROM public.voyc_agent_agentconversation AS voyc_agent_agentconversation_1
JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('voyc_agent_agent', CAST(JSON_BUILD_OBJECT('id', voyc_agent_agent_1.id) AS JSONB)) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('id', voyc_agent_agent_1.id, 'uid', voyc_agent_agent_1.uid) AS JSONB) AS agent,
voyc_agent_agent_1.id AS id
FROM public.voyc_agent_agent AS voyc_agent_agent_1
WHERE voyc_agent_agent_1.id = voyc_agent_agentconversation_1.agent_id) AS anon_10 ON anon_10.id = voyc_agent_agentconversation_1.agent_id
WHERE voyc_agent_agentconversation_1.conversation_id = voycapi_conversations_1.id
GROUP BY voyc_agent_agentconversation_1.conversation_id) AS anon_1 ON anon_1.conversation_id = voycapi_conversations_1.id
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('voyc_attribute_attributevalue', JSON_AGG(CAST(JSON_BUILD_OBJECT('id', JSON_BUILD_ARRAY(voyc_attribute_attributevalue_1.id)) AS JSONB))) AS JSONB) || CAST(JSON_AGG(anon_11._keys) AS JSONB) AS _keys,
JSON_AGG(CAST(JSON_BUILD_OBJECT('value', voyc_attribute_attributevalue_1.value, 'attribute', anon_11.attribute) AS JSONB)) AS attribute,
voyc_attribute_attributevalue_1.conversation_id AS conversation_id
FROM public.voyc_attribute_attributevalue AS voyc_attribute_attributevalue_1
JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('voyc_attribute_attribute', CAST(JSON_BUILD_OBJECT('id', voyc_attribute_attribute_1.id) AS JSONB)) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('id', voyc_attribute_attribute_1.id, 'name', voyc_attribute_attribute_1.name) AS JSONB) AS attribute,
voyc_attribute_attribute_1.id AS id
FROM public.voyc_attribute_attribute AS voyc_attribute_attribute_1
WHERE voyc_attribute_attribute_1.id = voyc_attribute_attributevalue_1.attribute_id) AS anon_11 ON anon_11.id = voyc_attribute_attributevalue_1.attribute_id
WHERE voyc_attribute_attributevalue_1.conversation_id = voycapi_conversations_1.id
GROUP BY voyc_attribute_attributevalue_1.conversation_id) AS anon_2 ON anon_2.conversation_id = voycapi_conversations_1.id
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('voyc_grading_conversationscore', CAST(JSON_BUILD_OBJECT('id', JSON_BUILD_ARRAY(voyc_grading_conversationscore_1.id)) AS JSONB)) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('auto_compliance_resolution', voyc_grading_conversationscore_1.auto_compliance_resolution, 'manual_compliance_resolution', voyc_grading_conversationscore_1.manual_compliance_resolution) AS JSONB) AS grading,
voyc_grading_conversationscore_1.conversation_id AS conversation_id
FROM public.voyc_grading_conversationscore AS voyc_grading_conversationscore_1
WHERE voyc_grading_conversationscore_1.conversation_id = voycapi_conversations_1.id) AS anon_3 ON anon_3.conversation_id = voycapi_conversations_1.id
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('voycapi_conversationsilence', CAST(JSON_BUILD_OBJECT('id', JSON_BUILD_ARRAY(voycapi_conversationsilence_1.id)) AS JSONB)) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('duration', voycapi_conversationsilence_1.duration) AS JSONB) AS hold_time,
voycapi_conversationsilence_1.conversation_id AS conversation_id
FROM public.voycapi_conversationsilence AS voycapi_conversationsilence_1
WHERE voycapi_conversationsilence_1.conversation_id = voycapi_conversations_1.id) AS anon_4 ON anon_4.conversation_id = voycapi_conversations_1.id
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('voyc_review_conversationreview', CAST(JSON_BUILD_OBJECT('id', JSON_BUILD_ARRAY(voyc_review_conversationreview_1.id)) AS JSONB)) AS JSONB) || CAST(JSON_BUILD_ARRAY(anon_12._keys) AS JSONB) || CAST(JSON_BUILD_ARRAY(anon_13._keys) AS JSONB) || CAST(JSON_BUILD_ARRAY(anon_14._keys) AS JSONB) || CAST(JSON_BUILD_ARRAY(anon_15._keys) AS JSONB) || CAST(JSON_BUILD_ARRAY(anon_16._keys) AS JSONB) || CAST(JSON_BUILD_ARRAY(anon_17._keys) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('reviewed_status', voyc_review_conversationreview_1.reviewed_status, 'resolved_status', voyc_review_conversationreview_1.resolved_status, 'coached_status', voyc_review_conversationreview_1.coached_status, 'graded_status', voyc_review_conversationreview_1.graded_status, 'date_reviewed', voyc_review_conversationreview_1.date_reviewed, 'date_resolved', voyc_review_conversationreview_1.date_resolved, 'date_coached', voyc_review_conversationreview_1.date_coached, 'date_graded', voyc_review_conversationreview_1.date_graded, 'date_assigned', voyc_review_conversationreview_1.date_assigned, 'assigned_by', anon_12.assigned_by, 'assigned_to', anon_13.assigned_to, 'coached_by', anon_14.coached_by, 'graded_by', anon_15.graded_by, 'resolved_by', anon_16.resolved_by, 'reviewed_by', anon_17.reviewed_by) AS JSONB) AS review,
voyc_review_conversationreview_1.conversation_id AS conversation_id
FROM public.voyc_review_conversationreview AS voyc_review_conversationreview_1
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('users_user', CAST(JSON_BUILD_OBJECT('id', users_user_1.id) AS JSONB)) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('id', users_user_1.id, 'username', users_user_1.username) AS JSONB) AS assigned_by,
users_user_1.id AS id
FROM public.users_user AS users_user_1
WHERE users_user_1.id = voyc_review_conversationreview_1.assigned_by_id) AS anon_12 ON anon_12.id = voyc_review_conversationreview_1.assigned_by_id
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('users_user', CAST(JSON_BUILD_OBJECT('id', users_user_1.id) AS JSONB)) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('id', users_user_1.id, 'username', users_user_1.username) AS JSONB) AS assigned_to,
users_user_1.id AS id
FROM public.users_user AS users_user_1
WHERE users_user_1.id = voyc_review_conversationreview_1.assigned_to_id) AS anon_13 ON anon_13.id = voyc_review_conversationreview_1.assigned_to_id
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('users_user', CAST(JSON_BUILD_OBJECT('id', users_user_1.id) AS JSONB)) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('id', users_user_1.id, 'username', users_user_1.username) AS JSONB) AS coached_by,
users_user_1.id AS id
FROM public.users_user AS users_user_1
WHERE users_user_1.id = voyc_review_conversationreview_1.coached_by_id) AS anon_14 ON anon_14.id = voyc_review_conversationreview_1.coached_by_id
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('users_user', CAST(JSON_BUILD_OBJECT('id', users_user_1.id) AS JSONB)) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('id', users_user_1.id, 'username', users_user_1.username) AS JSONB) AS graded_by,
users_user_1.id AS id
FROM public.users_user AS users_user_1
WHERE users_user_1.id = voyc_review_conversationreview_1.graded_by_id) AS anon_15 ON anon_15.id = voyc_review_conversationreview_1.graded_by_id
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('users_user', CAST(JSON_BUILD_OBJECT('id', users_user_1.id) AS JSONB)) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('id', users_user_1.id, 'username', users_user_1.username) AS JSONB) AS resolved_by,
users_user_1.id AS id
FROM public.users_user AS users_user_1
WHERE users_user_1.id = voyc_review_conversationreview_1.resolved_by_id) AS anon_16 ON anon_16.id = voyc_review_conversationreview_1.resolved_by_id
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('users_user', CAST(JSON_BUILD_OBJECT('id', users_user_1.id) AS JSONB)) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('id', users_user_1.id, 'username', users_user_1.username) AS JSONB) AS reviewed_by,
users_user_1.id AS id
FROM public.users_user AS users_user_1
WHERE users_user_1.id = voyc_review_conversationreview_1.reviewed_by_id) AS anon_17 ON anon_17.id = voyc_review_conversationreview_1.reviewed_by_id
WHERE voyc_review_conversationreview_1.conversation_id = voycapi_conversations_1.id) AS anon_5 ON anon_5.conversation_id = voycapi_conversations_1.id
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('voyc_grading_conversationscore', CAST(JSON_BUILD_OBJECT('id', JSON_BUILD_ARRAY(voyc_grading_conversationscore_1.id)) AS JSONB)) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('manual_score', voyc_grading_conversationscore_1.manual_score, 'auto_score', voyc_grading_conversationscore_1.auto_score, 'date_graded', voyc_grading_conversationscore_1.date_graded, 'note_grader', voyc_grading_conversationscore_1.note_grader) AS JSONB) AS score,
voyc_grading_conversationscore_1.conversation_id AS conversation_id
FROM public.voyc_grading_conversationscore AS voyc_grading_conversationscore_1
WHERE voyc_grading_conversationscore_1.conversation_id = voycapi_conversations_1.id) AS anon_6 ON anon_6.conversation_id = voycapi_conversations_1.id
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('voyc_tag_taggedconversation', JSON_AGG(CAST(JSON_BUILD_OBJECT('id', JSON_BUILD_ARRAY(voyc_tag_taggedconversation_1.id)) AS JSONB))) AS JSONB) || CAST(JSON_AGG(anon_18._keys) AS JSONB) AS _keys,
JSON_AGG(CAST(JSON_BUILD_OBJECT('id', voyc_tag_taggedconversation_1.id, 'tag', anon_18.tag) AS JSONB)) AS tags,
voyc_tag_taggedconversation_1.conversation_id AS conversation_id
FROM public.voyc_tag_taggedconversation AS voyc_tag_taggedconversation_1
JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('voyc_tag_tag', CAST(JSON_BUILD_OBJECT('id', voyc_tag_tag_1.id) AS JSONB)) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('id', voyc_tag_tag_1.id, 'name', voyc_tag_tag_1.name) AS JSONB) AS tag,
voyc_tag_tag_1.id AS id
FROM public.voyc_tag_tag AS voyc_tag_tag_1
WHERE voyc_tag_tag_1.id = voyc_tag_taggedconversation_1.tag_id) AS anon_18 ON anon_18.id = voyc_tag_taggedconversation_1.tag_id
WHERE voyc_tag_taggedconversation_1.conversation_id = voycapi_conversations_1.id
GROUP BY voyc_tag_taggedconversation_1.conversation_id) AS anon_7 ON anon_7.conversation_id = voycapi_conversations_1.id
JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('voyc_transcript_transcript', CAST(JSON_BUILD_OBJECT('id', JSON_BUILD_ARRAY(voyc_transcript_transcript_1.id)) AS JSONB)) AS JSONB) || CAST(JSON_BUILD_ARRAY(anon_19._keys) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('id', voyc_transcript_transcript_1.id, 'confidence', voyc_transcript_transcript_1.confidence, 'utterance', anon_19.utterance) AS JSONB) AS transcript,
voyc_transcript_transcript_1.conversation_id AS conversation_id
FROM public.voyc_transcript_transcript AS voyc_transcript_transcript_1
JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('voyc_transcript_utterance', JSON_AGG(CAST(JSON_BUILD_OBJECT('id', voyc_transcript_utterance_1.id) AS JSONB))) AS JSONB) || CAST(JSON_AGG(anon_20._keys) AS JSONB) AS _keys,
JSON_AGG(CAST(JSON_BUILD_OBJECT('id', voyc_transcript_utterance_1.id, 'emotion', voyc_transcript_utterance_1.emotion, 'sentence', anon_20.sentence) AS JSONB)) AS utterance,
voyc_transcript_utterance_1.transcript_id AS transcript_id
FROM public.voyc_transcript_utterance AS voyc_transcript_utterance_1
JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('voyc_transcript_sentence', JSON_AGG(CAST(JSON_BUILD_OBJECT('id', voyc_transcript_sentence_1.id) AS JSONB))) AS JSONB) AS _keys,
JSON_AGG(CAST(JSON_BUILD_OBJECT('id', voyc_transcript_sentence_1.id, 'text', voyc_transcript_sentence_1.text, 'words_{0,start}', voyc_transcript_sentence_1.words #> '{0,start}') AS JSONB)) AS sentence,
voyc_transcript_sentence_1.utterance_id AS utterance_id
FROM public.voyc_transcript_sentence AS voyc_transcript_sentence_1
WHERE voyc_transcript_sentence_1.utterance_id = voyc_transcript_utterance_1.id
GROUP BY voyc_transcript_sentence_1.utterance_id) AS anon_20 ON anon_20.utterance_id = voyc_transcript_utterance_1.id
WHERE voyc_transcript_utterance_1.transcript_id = voyc_transcript_transcript_1.id
GROUP BY voyc_transcript_utterance_1.transcript_id) AS anon_19 ON anon_19.transcript_id = voyc_transcript_transcript_1.id
WHERE voyc_transcript_transcript_1.conversation_id = voycapi_conversations_1.id
AND (voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419
OR voyc_transcript_transcript_1.id = 4273419)) AS anon_8 ON anon_8.conversation_id = voycapi_conversations_1.id
JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('voyc_workflow_workflowsession', CAST(JSON_BUILD_OBJECT('id', JSON_BUILD_ARRAY(voyc_workflow_workflowsession_1.id)) AS JSONB)) AS JSONB) || CAST(JSON_BUILD_ARRAY(anon_21._keys) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('id', voyc_workflow_workflowsession_1.id, 'workflow_id', voyc_workflow_workflowsession_1.workflow_id, 'workflowsessiontask_session', anon_21.workflowsessiontask_session) AS JSONB) AS workflow_session,
voyc_workflow_workflowsession_1.conversation_id AS conversation_id
FROM public.voyc_workflow_workflowsession AS voyc_workflow_workflowsession_1
JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('voyc_workflow_workflowsessiontask', JSON_AGG(CAST(JSON_BUILD_OBJECT('id', voyc_workflow_workflowsessiontask_1.id) AS JSONB))) AS JSONB) || CAST(JSON_AGG(anon_22._keys) AS JSONB) || CAST(JSON_AGG(anon_23._keys) AS JSONB) || CAST(JSON_AGG(anon_24._keys) AS JSONB) || CAST(JSON_AGG(anon_25._keys) AS JSONB) AS _keys,
JSON_AGG(CAST(JSON_BUILD_OBJECT('id', voyc_workflow_workflowsessiontask_1.id, 'task_id', voyc_workflow_workflowsessiontask_1.task_id, 'status', voyc_workflow_workflowsessiontask_1.status, 'date_assigned', voyc_workflow_workflowsessiontask_1.date_assigned, 'date_completed', voyc_workflow_workflowsessiontask_1.date_completed, 'assigned_by', anon_22.assigned_by, 'assigned_to', anon_23.assigned_to, 'completed_by', anon_24.completed_by, 'task', anon_25.task) AS JSONB)) AS workflowsessiontask_session,
voyc_workflow_workflowsessiontask_1.workflow_session_id AS workflow_session_id
FROM public.voyc_workflow_workflowsessiontask AS voyc_workflow_workflowsessiontask_1
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('users_user', CAST(JSON_BUILD_OBJECT('id', users_user_1.id) AS JSONB)) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('id', users_user_1.id, 'username', users_user_1.username) AS JSONB) AS assigned_by,
users_user_1.id AS id
FROM public.users_user AS users_user_1
WHERE users_user_1.id = voyc_workflow_workflowsessiontask_1.assigned_by_id) AS anon_22 ON anon_22.id = voyc_workflow_workflowsessiontask_1.assigned_by_id
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('users_user', CAST(JSON_BUILD_OBJECT('id', users_user_1.id) AS JSONB)) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('id', users_user_1.id, 'username', users_user_1.username) AS JSONB) AS assigned_to,
users_user_1.id AS id
FROM public.users_user AS users_user_1
WHERE users_user_1.id = voyc_workflow_workflowsessiontask_1.assigned_to_id) AS anon_23 ON anon_23.id = voyc_workflow_workflowsessiontask_1.assigned_to_id
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('users_user', CAST(JSON_BUILD_OBJECT('id', users_user_1.id) AS JSONB)) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('id', users_user_1.id, 'username', users_user_1.username) AS JSONB) AS completed_by,
users_user_1.id AS id
FROM public.users_user AS users_user_1
WHERE users_user_1.id = voyc_workflow_workflowsessiontask_1.completed_by_id) AS anon_24 ON anon_24.id = voyc_workflow_workflowsessiontask_1.completed_by_id
LEFT OUTER JOIN LATERAL
(SELECT CAST(JSON_BUILD_OBJECT('voyc_workflow_task', CAST(JSON_BUILD_OBJECT('id', voyc_workflow_task_1.id) AS JSONB)) AS JSONB) AS _keys,
CAST(JSON_BUILD_OBJECT('name', voyc_workflow_task_1.name, 'is_grading', voyc_workflow_task_1.is_grading) AS JSONB) AS task,
voyc_workflow_task_1.id AS id
FROM public.voyc_workflow_task AS voyc_workflow_task_1
WHERE voyc_workflow_task_1.id = voyc_workflow_workflowsessiontask_1.task_id) AS anon_25 ON anon_25.id = voyc_workflow_workflowsessiontask_1.task_id
WHERE voyc_workflow_workflowsessiontask_1.workflow_session_id = voyc_workflow_workflowsession_1.id
GROUP BY voyc_workflow_workflowsessiontask_1.workflow_session_id) AS anon_21 ON anon_21.workflow_session_id = voyc_workflow_workflowsession_1.id
WHERE voyc_workflow_workflowsession_1.conversation_id = voycapi_conversations_1.id) AS anon_9 ON anon_9.conversation_id = voycapi_conversations_1.id
This query takes about 23 minutes to complete and appears to return no data. Due to this long query time, the backlog of items to sync also quickly grows.
The stdout looks like this:
[--------------------------------------------------] 0/0 0%
Another side effect of this problem is that the replication slot isn't advancing, nor is the checkpoint file updating. This causes the DB disk usage to consistently grow and also slows down subsequent syncs due to the PG_LOGICAL_SLOT_PEEK_CHANGES
query having to search through a larger amount of data.
Thanks for this again @voyc-jean. I spent some time trying to narrow down the cause. It would really help to have access to your db schema and also psgync schema as well. The duplicated values in the query don't seem right and I would like to fix that first.
Hi @toluaina
After some more investigating, I discovered that there are multiple problems here.
I've decided to rather create separate issues for each of these:
In addition, is there any way that I can directly communicate with you? Perhaps an email address?
Thanks.
Hi @toluaina
After some more investigating, I discovered that there are multiple problems here.
I've decided to rather create separate issues for each of these:
268 #267 #269
In addition, is there any way that I can directly communicate with you? Perhaps an email address?
Thanks.
Any updates?
I have a simple, 3 tier table like this:
Here's my schema:
On an update to the
user
object, the SQL generated by sqlalchemy produces the following inefficient query (see explain analyze). This runs in about 57 seconds on my dataset:I can optimize this query to produce the same result. It's much more efficient with memory and runs on about 300ms (see the explain analyze):
This design also resolves issue #149.
@toluaina - Can you think of any problems with this approach? Can you provide input on the design?