rapidpro / casepro

Case management dashboard for RapidPro
BSD 3-Clause "New" or "Revised" License
21 stars 31 forks source link

Indexable views for labels + partner inboxes #286

Closed rowanseymour closed 4 years ago

rowanseymour commented 4 years ago

There are several ways messages are accessed in CasePro

  1. Administrator top-level folders (Inbox, Archived, ..)
  2. Partner specific top-level folders (Inbox, Archived, ..)
  3. Single labels (can be archived or not)
  4. Contact timelines

1 is all the messages in an org with given attributes (e.g. is_archived=False). These views hit perfect indexes built on the messages table

2 is all the messages in an org with given attributes (e.g. is_archived=False), and one of the labels that partner org has access to. There's currently no good way to index this and it creates some awful queries

3 was generally ok but now times out because Postgres is loads all messages with that label into memory and does a sort by created_on there.

4 is fine as contacts don't typically have that many messages

So thinking about ways to improve 2 and 3.. what if we generalized those as "views". A model something like:

ID partner_id label_id archived description
1 23 NULL T Inbox for org 23
2 23 NULL F Archived folder for org 23
3 23 45 F Inbox messages with label 45
4 23 45 T Archived messages with label 45

Then have m2m with that and the messages table which has the message created_on

view_id message_id created_on
1 1001 2019-12-09T10:04:34
1 1002 2019-12-09T10:04:35
2 1003 2019-12-09T10:04:36

Index that table by created_on DESC and then searching becomes a matter of grabbing a page of message ids from that table and then fetching them from messages table.

The tricky part would be keeping that table up to date. There are 3 ways a message's assignment to a view could change:

  1. its labels change (when received, or when labels changed in the UI)
  2. its attributes change (e.g. it's archived or unarchived)
  3. a partner org's labels change

1 and 2 could probably be handled by DB triggers. 3 is trickier since it can mean a large number of messages need to be added/removed from that table.. so 3. would have to be background task.

If we left the current message search code, we could use that as a fallback, i.e. a view has a flag on it to show if it's ready for use, if it isn't we use the old search code. That also gives us a nice path to introducing views piecemeal and not in one big mega migration.

nicpottier commented 4 years ago

You really don't think having created on on the labels M2M would be enough? That seems the most obvious and doesn't require any pre-calculation of the views. Since pagination is only so deep it seems like it would just be the union of many perfect index queries which feels like it should be fast enough.

rowanseymour commented 4 years ago

Truth is I honestly don't know if that would be enough.. but it's definitely less work so I'm happy to give it a try

rowanseymour commented 4 years ago

Well I can think of scenarios where what you describe would struggle - let's say you're trying to get the archived messages for a label where the majority of messages are unarchived or visa versa - you still end up reading the majority of the messages into memory to check attributes like is_archived

rowanseymour commented 4 years ago

Maybe adding is_archived as well to the labels m2m isn't too terrible - messages don't typically have that many labels so a DB trigger could keep that in sync with msg.is_archived

rowanseymour commented 4 years ago

So what I pushed yesterday which does seem to improve things for the big orgs is..

Limitations..