maragudk / goqite

Go queue library built on SQLite and inspired by AWS SQS.
https://maragudk.github.io/goqite/
MIT License
435 stars 13 forks source link

Possible performance issue, with cause and solution included. #34

Closed iredmail closed 8 months ago

iredmail commented 8 months ago

According to schema.sql, it currently stores message body and metadata in same table goqite.

create table goqite (
  id text primary key default ('m_' || lower(hex(randomblob(16)))),
  created text not null default (strftime('%Y-%m-%dT%H:%M:%fZ')),
  updated text not null default (strftime('%Y-%m-%dT%H:%M:%fZ')),
  queue text not null,
  body blob not null,
  timeout text not null default (strftime('%Y-%m-%dT%H:%M:%fZ')),
  received integer not null default 0
) strict;

We experienced (poor) performance issue with large BLOG with similar sql table, so i'd like to share our story briefly for your reference.

Our customer imported mailbox into the software which generated about 1000 records and 8GB in total (in SQLite, of course), each record is an email message about 8MB large. Our program had a bug and we had to update the SQL records manually for troubleshooting, while performing SQL command like UPDATE <table> SET received=1; which affects all rows, SQLite command line tool toke a long time (~ 8min) to finish. This is not right and we started checking SQLite doc to figure it out.

We found the root cause (improper sql table design) and solution (separate large BLOG and metadata), hope it helps.

markuswustenberg commented 8 months ago

Hi @iredmail, thank you for your issue!

Yes, I would expect tables with large binary data to take much longer to be queried.

I don't think I want to provide a code solution for that, since I don't think this is the common case for a queue like this. In that case, I would only store e.g. a path to a blob in the message, not the blob itself.

But I would like to add it to the documentation, since it may be unexpected.

Thanks again. 😊

iredmail commented 8 months ago

Yes, I would expect tables with large binary data to take much longer to be queried.

It's not about querying, but updating. Querying is fast enough with large binary data.

I don't think I want to provide a code solution for that, since I don't think this is the common case for a queue like this. In that case, I would only store e.g. a path to a blob in the message, not the blob itself.

Since it's a queue, and usually queued data will be removed from queue after consumed, storing large blob in SQLite is quite normal and a lot easier to handle. :)

Anyway, issue closed.