Indexable views for labels + partner inboxes

rowanseymour commented 4 years ago

There are several ways messages are accessed in CasePro

Administrator top-level folders (Inbox, Archived, ..)
Partner specific top-level folders (Inbox, Archived, ..)
Single labels (can be archived or not)
Contact timelines

1 is all the messages in an org with given attributes (e.g. is_archived=False). These views hit perfect indexes built on the messages table

2 is all the messages in an org with given attributes (e.g. is_archived=False), and one of the labels that partner org has access to. There's currently no good way to index this and it creates some awful queries

3 was generally ok but now times out because Postgres is loads all messages with that label into memory and does a sort by created_on there.

4 is fine as contacts don't typically have that many messages

So thinking about ways to improve 2 and 3.. what if we generalized those as "views". A model something like:

ID	partner_id	label_id	archived	description
1	23	NULL	T	Inbox for org 23
2	23	NULL	F	Archived folder for org 23
3	23	45	F	Inbox messages with label 45
4	23	45	T	Archived messages with label 45

Then have m2m with that and the messages table which has the message created_on

view_id	message_id	created_on
1	1001	2019-12-09T10:04:34
1	1002	2019-12-09T10:04:35
2	1003	2019-12-09T10:04:36

Index that table by created_on DESC and then searching becomes a matter of grabbing a page of message ids from that table and then fetching them from messages table.

The tricky part would be keeping that table up to date. There are 3 ways a message's assignment to a view could change:

its labels change (when received, or when labels changed in the UI)
its attributes change (e.g. it's archived or unarchived)
a partner org's labels change

1 and 2 could probably be handled by DB triggers. 3 is trickier since it can mean a large number of messages need to be added/removed from that table.. so 3. would have to be background task.

If we left the current message search code, we could use that as a fallback, i.e. a view has a flag on it to show if it's ready for use, if it isn't we use the old search code. That also gives us a nice path to introducing views piecemeal and not in one big mega migration.

nicpottier commented 4 years ago

You really don't think having created on on the labels M2M would be enough? That seems the most obvious and doesn't require any pre-calculation of the views. Since pagination is only so deep it seems like it would just be the union of many perfect index queries which feels like it should be fast enough.

rowanseymour commented 4 years ago

Truth is I honestly don't know if that would be enough.. but it's definitely less work so I'm happy to give it a try

rowanseymour commented 4 years ago

Well I can think of scenarios where what you describe would struggle - let's say you're trying to get the archived messages for a label where the majority of messages are unarchived or visa versa - you still end up reading the majority of the messages into memory to check attributes like is_archived

rowanseymour commented 4 years ago

Maybe adding is_archived as well to the labels m2m isn't too terrible - messages don't typically have that many labels so a DB trigger could keep that in sync with msg.is_archived

rowanseymour commented 4 years ago

So what I pushed yesterday which does seem to improve things for the big orgs is..

Copy is_archived, is_flagged and created_on onto the labels m2m
Use DB trigger to keep those in sync with the message itself
Have custom indexes for how those are used in the Inbox/Archived/Flagged views
Search by each label separately, merge ids, grab messages with those ids, re-sort and limit

Limitations..

Only takes that path when not filtering by anything else on the messages table - i.e. if you're searching by message text then you're using the regular search code
Couldn't figure out how to paging correctly across all those subqueries into the label m2m.. so for now it's grabbing the top 1000 messages for every label - you can only scroll to 1000 messages in the Inbox view

rapidpro / casepro

Indexable views for labels + partner inboxes #286