Question: Lifetime for blob object in `openblob`

ncruces / go-sqlite3

Go bindings to SQLite using wazero

https://pkg.go.dev/github.com/ncruces/go-sqlite3

MIT License

408 stars 13 forks source link

Question: Lifetime for blob object in `openblob` #148

Closed mtlynch closed 12 hours ago

mtlynch commented 1 week ago

Thanks for your work on this library! It's exciting to see a library offer blob I/O support alongside the database/sql interface.

I'm working on porting PicoShare to use this library, and I have some questions about the blobio extension.

The ideal semantics for me are if I can open an io.ReadSeeker to a blob in my database and then later use it how I'd like (similar to how I'd open a file handle on my local filesystem).

Attempt 1: Saving a copy of blob outside the callback

I tried doing this:

func (d *DB) ReadEntry(id string) (io.ReadSeeker, error) {
    var blob *sqlite3.Blob
    _, err := d.db.Exec(`SELECT openblob('main', 'entries', 'contents', (SELECT rowid FROM entries WHERE id = :id), :writeMode, :callback)`,
        sql.Named("id", id),
        sql.Named("writeMode", false),
        sql.Named("callback",
            sqlite3.Pointer[blobio.OpenCallback](func(b *sqlite3.Blob, _ ...sqlite3.Value) error {
                log.Printf("start callback")
                blob = b
                log.Printf("end callback")
                return nil
            })))
    if err != nil {
        return nil, fmt.Errorf("error opening blob for id %s: %w", id, err)
    }

    if blob == nil {
        return nil, fmt.Errorf("no entry found for id %s", id)
    }

    if _, err := io.Copy(os.Stdout, blob); err != nil {
        panic(err)
    }

    return blob, nil
}

Which results in this error:

$ go run main.go 
2024/09/06 08:43:05 start callback
2024/09/06 08:43:05 end callback
panic: sqlite3: bad parameter or other API misuse

goroutine 1 [running]:
main.(*DB).ReadEntry(0xc000596070, {0xc001d5fe40, 0x1c})
        /home/mike/code/nc-simple/main.go:84 +0x3b8
main.main()
        /home/mike/code/nc-simple/main.go:103 +0xc5
exit status 2

I assume it's because I'm not allowed to use the blob outside of the callback.

Attempt 2: Only read the blob within the callback

If I perform the copy within the callback, things work correctly:

func (d *DB) ReadEntry(id string) error {
    _, err := d.db.Exec(`SELECT openblob('main', 'entries', 'contents', (SELECT rowid FROM entries WHERE id = :id), :writeMode, :callback)`,
        sql.Named("id", id),
        sql.Named("writeMode", false),
        sql.Named("callback",
            sqlite3.Pointer[blobio.OpenCallback](func(b *sqlite3.Blob, _ ...sqlite3.Value) error {
                log.Printf("start callback")
                if _, err := io.Copy(os.Stdout, b); err != nil {
                    panic(err)
                }
                fmt.Fprint(os.Stdout, "\n")
                log.Printf("end callback")
                return nil
            })))
    if err != nil {
        return fmt.Errorf("error opening blob for id %s: %w", id, err)
    }

    return nil
}

$ go run main.go 
2024/09/06 09:03:18 start callback
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
2024/09/06 09:03:18 end callback

Full gist: https://gist.github.com/mtlynch/638e10675377515f15d9b086b929bb71

That said, only being able to interact with the blob from within the callback makes the semantics difficult for callers.

Context for what I'm doing

Ultimately, what I'm trying to do here is let HTTP clients write to and read files from the SQLite database by sending HTTP requests.

I currently do this with the mattn library and custom wrappers that expose an io.WriteCloser for writes and an io.ReadSeeker for reads.

The mattn library doesn't support blob I/O, so instead, my wrapper breaks up files into chunks of fixed size and then stores the chunks as distinct rows in a SQLite table.

Ultimately, what I'm hoping to do with your library is use the io.Reader interface from an HTTP multipart form POST and pass that to SELECT writeblob in the blobio extension of this library. And then I'd like to pass an io.ReadSeeker that I get from SELECT openblob and pass it to http.ServeContent for serving the data over HTTP.

The writing part seems like it should work fine, but I'm having trouble with the reading part.

Potential path

My best idea is to write my own wrapper around the SELECT openblob call that exposes the io.ReadSeeker interface but only calls openblob when a Read call occurs, kind of like this:

type wrapper struct {
  db *sql.DB
}

func (w Wrapper) Read(p []byte) (int, error) {
  var bytesRead int
_, err := w.db.Exec(`SELECT openblob('main', 'entries', 'contents', (SELECT rowid FROM entries WHERE id = :id), :writeMode, :callback)`,
        sql.Named("id", id),
        sql.Named("writeMode", false),
        sql.Named("callback",
            sqlite3.Pointer[blobio.OpenCallback](func(b *sqlite3.Blob, _ ...sqlite3.Value) error {
                n, err := b.Read(p)
                                bytesRead = n
                return err
            })))
  return bytesRead, err
}

func (w Wrapper) Seek(offset int64, whence int) (int64, error) {
  // TODO: Get this to affect the next Read call above
}

Does that make the most sense? Or is there a better way to achieve what I'm trying to do?

ncruces commented 1 week ago

I'll have to read your entire issue with more care to give better suggestions, but the semantics of the openblob function is that you can only use the *sqlite3.Blob inside the callback.

In fact, for a single query, the callback may be called more than one once.

Consider this query:

SELECT openblob('main', 'entries', 'contents', (SELECT rowid FROM entries WHERE id = :id), :writeMode, :callback)

This could be written:

SELECT openblob('main', 'entries', 'contents', rowid, :writeMode, :callback)
  FROM entries
  WHERE id = :id

Now consider this modification:

SELECT openblob('main', 'entries', 'contents', rowid, :writeMode, :callback, id)
  FROM entries
  WHERE id GLOB 'id:2024-09-06T*'

If I understood your schema and id generation correctly, this would invoke your callback for all BLOBs created on 2024-09-06, and if you note the additional id argument to openblob, your callback gets not just the *sqlite3.Blob, but the id as an argument.

This is something we can't really do if the *sqlite3.Blob outlives the call. Also, to use this with database/sql you really can't have the *sqlite3.Blob outlive the call either. An sqlite3.Blob* is bound to a specific connection and database/sql is built around a pool. When your query ends, you release the connection you got from the pool, and by the time you want to play with the sqlite3.Blob*, your connection may be being used by someone else in some other goroutine.

mtlynch commented 1 week ago

@ncruces - Thanks, that's very helpful!

I'll keep playing around with my io.ReadSeeker idea and see if I can get that working in my app.

ncruces commented 1 week ago

I'm not sure I understand why you can't start with your attempt 2, and use http.ServeContent instead of io.Copy?

Isn't http.ServeContent synchronous reading from the io.ReadSeeker and sending the necessary bytes?

I definitely would expect/want that to work (obviously, only a single blob should match your query).

ncruces commented 1 week ago

I notice that http.ServeContent uses io.CopyN which is not the most efficient implementation:

can't use WriteTo
uses a really small buffer

This is unfortunate, but it means that, probably, sqlite3.Blob should be reworked to optimize Read a bit more.

The idea would be to allocate the internal buffer only once. This will save a malloc/free for every read. It's still far from optimal (WriteTo/ReadFrom are much better), but it's probably the best we can do under the circumstances.

daenney commented 1 week ago

I notice that http.ServeContent uses io.CopyN which is not the most efficient implementation:
* can't use `WriteTo`

* uses a really small buffer

Looking at io.CopyN it calls through io.Copy which does use WriteTo if the source implements is. So I think that should still be OK?

mtlynch commented 1 week ago

I'm not sure I understand why you can't start with your attempt 2, and use http.ServeContent instead of io.Copy?

Yeah, that should work.

The issue in my case is that my storage component shouldn't care about HTTP, and my HTTP handling code shouldn't care about SQLite. It's easy to keep the two components loosely coupled if I could grab an io.ReadSeeker out of the database and hand that to http.ServeContent, but it should be doable with a callback as well.

I'll either write a wrapper that abstracts the SQLite stuff behind an io.ReadSeeker or figure out a way for my storage code to accept a callback that calls http.ServeContent with the sqlite3.Blob.

danp commented 1 week ago

Looking at io.CopyN it calls through io.Copy which does use WriteTo if the source implements it.

If I understand correctly since CopyN wraps the source in LimitedReader, which does not implement WriterTo, io.Copy won't use it even if the source passed to CopyN does implement it.

ncruces commented 1 week ago

If I understand correctly since CopyN wraps the source in LimitedReader, which does not implement WriterTo, io.Copy won't use it even if the source passed to CopyN does implement it.

Yes, this is the reason. Then there's TCPConn.ReadFrom but that eventually gives up (if it can't splice or sendFile) and you're back at io.copyBuffer called with no buffer, which uses a 32Kb buffer.

The reason I'm calling this out is that:

sqlite3.Blob.Read will malloc/sqlite3_blob_read/free each time, which has overhead [^1]
despite what it looks like, sqlite3_blob_read is not random access

A large BLOB in SQLite is a linked list of pages, and even sequential sqlite3_blob_read don't remember where they were. So you'll be traversing the linked list every time. The 1Mb buffer I use in WriteTo/ReadFrom is to amortize this cost. It's still not the best idea to store very large BLOBs this way. But it's OK for smallish MB sized BLOBs and not terrible for tens of MBs.

[^1]: one of the main reasons this database/sql driver can beat other drivers is that it makes a single call to read/scan all columns in a row at once; the main reason it looses vs. Cgo on inserts, is that Wasm SQLite can't read Go memory without a copy.

ncruces commented 1 week ago

9d77322d5000eaab6fb827488f78319397cfcfb9 exposes sqlite3_realloc and fixes some memory management issues (some APIs expect memory from sqlite3_malloc and I was giving them malloc).

The only odd ball now is aligned_alloc which, given how it's used (memory is never actually freed), could be done through sqlite_malloc/sqlite3_realloc.

ncruces commented 6 days ago

For reusing the same buffer for all calls to Read/Write/WriteTo/ReadFrom, 6a2827f989b6d69e32e8e70defecc983fcdfd5c2 should do it (assuming they are consistent in the buffer size used).

ncruces commented 6 days ago

I'll leave this open while you work on https://github.com/mtlynch/picoshare/pull/567.

If/when you're closer to done, if you want me to tag these/other improvements, feel free to ping me.

ncruces commented 5 days ago

Been thinking a bit more about this. Really curious what the performance will be like.

If we need to amortize reads better with larger buffers (because of io.LimitedReader), my bet is that the sqlite3.Blob should simply be wrapped in a bufio.Reader, like so:

type BufferedReadSeeker struct {
    bufio.Reader
    s io.ReadSeeker
}

func NewBufferedReadSeeker(r io.ReadSeeker, size int) *BufferedReadSeeker {
    return &BufferedReadSeeker{Reader: *bufio.NewReaderSize(r, size), s: r}
}

func (b *BufferedReadSeeker) Reset(r io.Reader) {
    b.s = r.(io.ReadSeeker)
    b.Reader.Reset(b.s)
}

func (b *BufferedReadSeeker) Seek(offset int64, whence int) (int64, error) {
    b.Reader.Reset(b.s)
    return b.s.Seek(offset, whence)
}

mtlynch commented 5 days ago

Thanks for all your work on this!

Also, just want to clarify that my donation was based on appreciation of past work, so I hope it didn't feel like there were strings attached or that I expected it to be enough to compensate you for further feature work here.

I've got a version of PicoShare's working with your driver, and it works locally, but I'm seeing strange errors in my end-to-end tests and when I deploy the code to a real cloud server. I'm going to try another test, but my schedule is a bit wacky for the next few weeks/months.

I tried uploading a 1.5 GB file to a PicoShare instance on fly.io with the existing code and one with the nc-sqlite branch I'm working on. Your driver was outperforming the mattn driver by about 30% in terms of upload speed to the server, but then when the upload reached 100%, it failed with "SQL logic error." I ran this test on September 8th, so I think I was pulling in c1915feb2e5b88fb3f4bb8fe5a394261a0ff0eff of your driver.

For a bit of additional context, for PicoShare's needs, I/O speed is not as important as memory overhead. In my current implementation where I break the file into chunks with the mattn driver, on a small 256 MB RAM cloud server, things work fine until files get to be about 1 GB, and then I start seeing memory exhaustion for files larger than that (https://github.com/mtlynch/picoshare/issues/355). Throwing a ton of server resources at it solves the problem, but I get a lot of reports from users about crashes or sluggish performance on large files.

ncruces commented 5 days ago

OK so... First of all, congrats! Second, thanks for the donation! Third, I feel no pressure from the donation, and I also hope you don't feel any pressure from my eagerness: please, enjoy your family time!

The reason I've been this engaged is that I always envisioned the blob API being used for this exact use case... and you (accidentally?) volunteered as guinea pig.

Looking over your code, I see one potential issue: using last_insert_rowid() for writeblob (which I did on the example, but "fixed" in 356dd56e5f1d79c3dec914cb981e9cea362608f9). The issue there is that last_insert_rowid() is "per-connection" and depending on what your Store.ctx is the Execs might be run on a connection pool with multiple connections, and last_insert_rowid() may run on a different connection than the INSERT.

mtlynch commented 5 days ago

Thanks!

Okay, I tried deploying a fix to a 1x-shared-CPU-256MB-RAM fly.io server based on https://github.com/mtlynch/picoshare/pull/567/commits/dfbedf814e363ed6c1896511ed8cda50b4e9f701, which includes your last_insert_rowid fix.

The app uses about 83 MiB of 213 MiB of available RAM at idle, and I can upload small files fine. When I tried uploading a 237 MB file, PicoShare crashes with a failure to allocate memory:

2024-09-11T18:14:19.162 app[4d890d95c5e178] iad [info] 2024/09/11 18:14:19 /app/store/sqlite/entries.go:170: saving new entry qyj5zRX5ka
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] 2024/09/11 18:14:19 /usr/local/go/src/net/http/server.go:3413: http: panic serving 172.16.129.90:62650: cannot allocate memory (recovered by wazero)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] wasm stack trace:
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] sqlite3.wasm.sbrk(i32) i32
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] sqlite3.wasm.dlmalloc(i32) i32
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] sqlite3.wasm.sqlite3MemMalloc(i32) i32
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] sqlite3.wasm.sqlite3Malloc(i64) i32
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] sqlite3.wasm.dbMallocRawFinish(i32,i64) i32
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] sqlite3.wasm.sqlite3DbMallocRawNN(i32,i64) i32
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] sqlite3.wasm.sqlite3VdbeMemGrow(i32,i32,i32) i32
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] sqlite3.wasm.sqlite3VdbeExec(i32) i32
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] sqlite3.wasm.sqlite3_step(i32) i32
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] goroutine 104 [running]:
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] net/http.(*conn).serve.func1()
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/net/http/server.go:1898 +0xbe
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] panic({0x93ce60?, 0xc00233e2a0?})
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/runtime/panic.go:770 +0x132
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] github.com/ncruces/go-sqlite3.(*sqlite).call(0xc000738e08, {0x9ce260, 0xc?}, {0xc000338780?, 0x1?, 0xc0003386c0?})
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /go/pkg/mod/github.com/ncruces/go-sqlite3@v0.18.3-0.20240910104819-6a2827f989b6/sqlite.go:181 +0x1ad
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] github.com/ncruces/go-sqlite3.(*Stmt).Step(0xc0032ba900)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /go/pkg/mod/github.com/ncruces/go-sqlite3@v0.18.3-0.20240910104819-6a2827f989b6/stmt.go:109 +0x5f
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] github.com/ncruces/go-sqlite3.(*Stmt).Exec(0xc0032ba900)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /go/pkg/mod/github.com/ncruces/go-sqlite3@v0.18.3-0.20240910104819-6a2827f989b6/stmt.go:133 +0x1f
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] github.com/ncruces/go-sqlite3/driver.(*stmt).ExecContext(0xc0032ba930, {0xc694f0, 0x11b2c40}, {0xc00328e3c0?, 0x0?, 0xc000338838?})
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /go/pkg/mod/github.com/ncruces/go-sqlite3@v0.18.3-0.20240910104819-6a2827f989b6/driver/driver.go:468 +0xd8
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] database/sql.ctxDriverStmtExec({0xc694f0, 0x11b2c40}, {0xc698e0, 0xc0032ba930}, {0xc00328e3c0, 0x8, 0x8?})
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/database/sql/ctxutil.go:65 +0xa3
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] database/sql.resultFromStatement({0xc694f0, 0x11b2c40}, {0xc68c28, 0xc000072280}, 0xc000338a20, {0xc000339048, 0x8, 0x8})
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/database/sql/sql.go:2670 +0x13a
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] database/sql.(*DB).execDC(0x1?, {0xc694f0, 0x11b2c40}, 0xc0000fe000, 0x18?, {0x9f22e3, 0x100}, {0xc000339048, 0x8, 0x8})
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/database/sql/sql.go:1722 +0x42c
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] database/sql.(*DB).exec(0xc00007cf70, {0xc694f0, 0x11b2c40}, {0x9f22e3, 0x100}, {0xc0000c7048, 0x8, 0x8}, 0x40?)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/database/sql/sql.go:1683 +0xd5
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] database/sql.(*DB).ExecContext.func1(0x0?)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/database/sql/sql.go:1662 +0x4f
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] database/sql.(*DB).retry(0x0?, 0xc000338cc0)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/database/sql/sql.go:1566 +0x42
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] database/sql.(*DB).ExecContext(0x96ff80?, {0xc694f0?, 0x11b2c40?}, {0x9f22e3?, 0x0?}, {0xc0000c7048?, 0x92b360?, 0xc0000c6d80?})
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/database/sql/sql.go:1661 +0xc8
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] database/sql.(*DB).Exec(...)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/database/sql/sql.go:1675
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] github.com/mtlynch/picoshare/v2/store/sqlite.Store.InsertEntry({0x0?}, {0xc64c00, _}, {{0xc000306060, 0xa}, {0xc0015e1000, 0x23}, {0x0}, {0xc000306220, 0x9}, ...})
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /app/store/sqlite/entries.go:172 +0x55b
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] github.com/mtlynch/picoshare/v2/handlers.Server.insertFileFromRequest({0xc0007fa180, {0xc67ff8, 0xc003165e18}, {0xc6f060, 0xc00003a0a0}, {0xc64a80, 0xc00006fc00}, 0xc003165de8}, 0xc002449680, {0x0, ...}, ...)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /app/handlers/upload.go:262 +0x629
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] github.com/mtlynch/picoshare/v2/handlers.(*Server).routes.Server.entryPost.func3({0xc68ad8, 0xc00327e410}, 0xc002449680)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /app/handlers/upload.go:54 +0x16a
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] net/http.HandlerFunc.ServeHTTP(0xc69528?, {0xc68ad8?, 0xc00327e410?}, 0x411d25?)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/net/http/server.go:2166 +0x29
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] github.com/mtlynch/picoshare/v2/handlers.Server.requireAuthentication-fm.Server.requireAuthentication.func1({0xc68ad8, 0xc00327e410}, 0xc002449680)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /app/handlers/auth.go:36 +0x6d
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] net/http.HandlerFunc.ServeHTTP(0xc002449560?, {0xc68ad8?, 0xc00327e410?}, 0x11392f0?)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/net/http/server.go:2166 +0x29
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] github.com/mtlynch/picoshare/v2/handlers.Server.checkAuthentication-fm.Server.checkAuthentication.func1({0xc68ad8, 0xc00327e410}, 0xc002449560)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /app/handlers/auth.go:25 +0xcc
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] net/http.HandlerFunc.ServeHTTP(0xc002449440?, {0xc68ad8?, 0xc00327e410?}, 0x0?)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/net/http/server.go:2166 +0x29
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] github.com/gorilla/mux.(*Router).ServeHTTP(0xc0007fa180, {0xc68ad8, 0xc00327e410}, 0xc002448d80)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /go/pkg/mod/github.com/gorilla/mux@v1.8.0/mux.go:210 +0x1e2
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] github.com/mtlynch/gorilla-handlers.loggingHandler.ServeHTTP({{0xc64aa0, 0xc00003a058}, {0xc64ac0, 0xc0007fa180}, 0xbd6678}, {0xc68178, 0xc000316000}, 0xc002448d80)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /go/pkg/mod/github.com/mtlynch/gorilla-handlers@v1.5.2/logging.go:47 +0xef
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] main.main.ProxyIPHeadersHandler.func3({0xc68178, 0xc000316000}, 0xc002448d80)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /go/pkg/mod/github.com/mtlynch/gorilla-handlers@v1.5.2/proxy_ip.go:15 +0x95
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] net/http.HandlerFunc.ServeHTTP(0x411d25?, {0xc68178?, 0xc000316000?}, 0xc000316001?)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/net/http/server.go:2166 +0x29
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] net/http.serverHandler.ServeHTTP({0xc67008?}, {0xc68178?, 0xc000316000?}, 0x6?)
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/net/http/server.go:3137 +0x8e
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] net/http.(*conn).serve(0xc0000fe5a0, {0xc69528, 0xc0032ba5a0})
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/net/http/server.go:2039 +0x5e8
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] created by net/http.(*Server).Serve in goroutine 20
2024-09-11T18:14:19.164 app[4d890d95c5e178] iad [info] /usr/local/go/src/net/http/server.go:3285 +0x4b4
2024-09-11T18:14:19.164 proxy[4d890d95c5e178] iad [error] [PU02] could not complete HTTP request to instance: connection closed before message completed

Does anything there stick out to you? Should I try rearchitecting it to break up the file into smaller blobs that span several rows rather than sticking files of arbitrary length into a single blob per file?

ncruces commented 5 days ago

I would not expect this to cause any memory issues. Let me look into it.

I'm not sure I can replicate fly specifically without creating an account with them, but I have a small hosted VM I can try. It'll take me time though.

ncruces commented 5 days ago

This is odd: app/store/sqlite/entries.go:172 is the INSERT that creates the blob, not the SELECT that writes it. It really shouldn't allocate that much memory.

Tangent: Other than that it is failing in a way that is actually very interesting to me, so I'll look into this further. When the Go runtime can't allocate memory, that's usually a catastrophic event that can't be recovered from. So wazero was designed assuming this, because wasm memory is just a []byte that we append. I then extended wazero to to allow custom memory allocators, and use mmap to back that []byte. Not being able to mmap can be handled, but I'm panicking, so sbrk crashes instead of ENOMEM, and malloc and SQLite don't get a chance to handle it more gracefully. This could work better.

Other than that, my assumption is that as your VM is memory strapped, and since each connection lives under the illusion that it can allocate 4GB of memory, it tries to, and eventually fails. You could try something like this, to force each connection to live with 32MB of memory (and maybe SetMaxOpenConns if not doing so already):

sqlite3.RuntimeConfig = wazero.NewRuntimeConfig().
    WithMemoryLimitPages(512)

But this is more configuration than should be needed, and I'm not sure it works. I guess for memory constrained devices setting the soft/hard limits would be better, but are disabled in the current build.

This driver probably does consume more memory than mattn, and I wasted your time. 😔

mtlynch commented 5 days ago

I'm not sure I can replicate fly specifically without creating an account with them, but I have a small hosted VM I can try. It'll take me time though.

Were you able to repro? I realized I could repro a little easier using Docker's resource limits (docker run --memory 256m)

I made a branch that makes the repro easier:

https://github.com/mtlynch/picoshare/tree/deploy-ncruces

Script for spinning up the Docker container is ./dev-scripts/serve-docker. The problem is that I don't see the log messages about the crash like I do on fly.io. I just see the logs stop and the container dies when it tries to write the blob.

Other than that, my assumption is that as your VM is memory strapped, and since each connection lives under the illusion that it can allocate 4GB of memory, it tries to, and eventually fails.

Is 4 GB an assumption of SQLite itself or ncruces/go-sqlite3?

You could try something like this, to force each connection to live with 32MB of memory (and maybe SetMaxOpenConns if not doing so already):

I just tried this, but I continue seeing the crash under resource-capped Docker.

This driver probably does consume more memory than mattn, and I wasted your time. 😔

Aw, are you sure? I feel like there has to be some way of writing an arbitrarily large file to SQLite with only 256 MB of RAM.

256 MB isn't even a hard requirement. If I can tell users that arbitrarily large files will work as long as they have at least N GB of RAM, that would work too as long as N isn't something crazy.

ncruces commented 4 days ago

Were you able to repro? I realized I could repro a little easier using Docker's resource limits (docker run --memory 256m)

No, I didn't have the time. I'll try though, as I'm interested in the wazero side of this too.

Is 4 GB an assumption of SQLite itself or ncruces/go-sqlite3?

It's a Wasm assumption. Each Wasm module is like a sandbox/VM, and my SQLite module is default configured to max out at the maximum possible Wasm memory (4G). Each connection lives in its own sandbox. But memory is allocated as needed, and an Exec with a zeroblob shouldn't need much.

What this makes hard is to inform SQLite of the memory restriction so it can adapt. Your machine has a lot of memory, the container has much fewer memory (and Go may or may not know this), Wasm thinks there's a lot of memory, and so does SQLite. When a malloc fails there's probably lots of cached memory that could be freed, and lots of GC that could free up memory, but it doesn't know, so it panics.

Beyond improving what happens on malloc failure (SQLite is supposed to be able to deal with that, if the failure is reported instead of crashing), I should probably expose the soft/hard limit APIs. It's also possible that tweaking the page cache size, and cache spill settings.

But it is a fact that this is less optimized for low memory than a Cgo driver. Maybe it makes sense to push for a better blob API on the Cgo driver: https://github.com/mattn/go-sqlite3/issues/239#issuecomment-1806762291

daenney commented 4 days ago

One thing with the container; Go is not cgroup-aware, so it doesn't know that it doesn't have the hosts' full memory at its disposal when running in a container.

There's an automemlimit package to help with that, or you can set GOMEMLIMIT to about 90% of the amount of memory assigned to the container.

mtlynch commented 4 days ago

It's a Wasm assumption. Each Wasm module is like a sandbox/VM, and my SQLite module is default configured to max out at the maximum possible Wasm memory (4G). Each connection lives in its own sandbox. But memory is allocated as needed, and an Exec with a zeroblob shouldn't need much.

If I made a PicoShare-specific fork of ncruces/go-sqlite3, would it be practical for me to cap the wasm memory to something smaller? Or is there any possibility of making this configurable or dynamic in the main implementation based on system RAM?

But it is a fact that this is less optimized for low memory than a Cgo driver. Maybe it makes sense to push for a better blob API on the Cgo driver: https://github.com/mattn/go-sqlite3/issues/239#issuecomment-1806762291

I don't think anyone's willing and able to do feature development on the mattn implementation. There hasn't been much activity there for a while.

Beyond improving what happens on malloc failure (SQLite is supposed to be able to deal with that, if the failure is reported instead of crashing), I should probably expose the soft/hard limit APIs. It's also possible that tweaking the page cache size, and cache spill settings.

Cool, I'd be interested in re-trying this if you end up implementing that. Your library seemed like a potential solution to this memory exhaustion issue I've had, but I don't need the fix urgently, as I'm able to use PicoShare fine for 99% of my needs without supporting 1 GB files.

ncruces commented 4 days ago

I already uncovered and fixed a bug in the allocator in fdfaaa8cec75c240c41f83817a58521b59591523. I'll have to make a release shortly because this can cause data corruption (not silent data corruption, only in a OOM crash).

You can reduce memory usage today with:

sqlite3.RuntimeConfig = wazero.NewRuntimeConfig().
    WithMemoryLimitPages(512)

But then you're likely to hit the bug above. Each Wasm page is 64K, so that's 32MB. I've got a 150MB Blob inserted with as little as 4MB memory per connection, though.

The problem is that this limits SQLite to X MB, but SQLite doesn't know. ~The first time it finds out is when a malloc returns NULL. Then, depending on what's being allocated it may try to release memory and try again. Or it may give up.~[^1]

Ideally, you say the limit is X, and then tell SQLite "make do with X-N." This needs some additional APIs. And if you use those APIs, maybe you don't need to limit Wasm memory after all, it can grow dynamically, and malloc is good enough about recycling memory.

It's not out of the question that I have some leak, somewhere, because really this shouldn't require that much memory. I should also look over your PRAGMAs to see if you're using more memory than necessary. Cloud VMs also often add an issue where you think you're using temporary files, but these are actually RAM that counts against your allotment.

[^1]: This only happens if compiled with SQLITE_ENABLE_MEMORY_MANAGEMENT, which is uncommon, and we're not doing. Since reacting to these kind of changes would require recompiling wazero, and wouldn't work with the default wazero memory handling, I don't think it's worth it.

ncruces commented 4 days ago

Updated the above comment. Ideally I add these new APIs, and it's a matter of calling them, and getting to a reasonable low memory configuration (smaller page cache, and/or a soft memory limit). Then you can have a few connections (like 4) each keeping under 32Mb (with a soft limit of say 16 or 24Mb) with room to spare for Go and everything else.

For a low memory environment that should be plenty, and you can do that with a few PRAGMAs.

mtlynch commented 4 days ago

I already uncovered and fixed a bug in the allocator in fdfaaa8. I'll have to make a release shortly because this can cause data corruption (not silent data corruption, only in a OOM crash).

I tested with this version but see the same behavior, unfortunately.

Ideally I add these new APIs, and it's a matter of calling them, and getting to a reasonable low memory configuration (smaller page cache, and/or a soft memory limit). Then you can have a few connections (like 4) each keeping under 32Mb (with a soft limit of say 16 or 24Mb) with room to spare for Go and everything else.

Cool! If you do add these APIs, let me know, and I'm happy to test on PicoShare again.

ncruces commented 3 days ago

Cool! If you do add these APIs, let me know, and I'm happy to test on PicoShare again.

I haven't tested much, but these should be in v0.18.3.

I would expect setting the soft_heap_limit PRAGMA to constrain memory usage. I would also suggest looking into PRAGMAs: cache_size, cache_spill and temp_store.

I haven't got round to testing PicoShare proper, because my work machine is severely constrained, but I plan to. As I said, I was able to insert 100MB blobs into a database with writeblob with as little as 4MB of memory dedicated to each connection, so something else should be the problem (not necessarily PicoShare, but not the blobio extension in isolation).

ncruces commented 3 days ago

I'm guessing, if the problem persists, it might be related to this comment in your source code: "We deliberately don't use a transaction here, as it bloats memory..."

With the blob API the code becomes transactional, because even if you don't do a transaction, inserting the (empty) blob is a single statement (so transactional) and updating it is, again, a single statement. It might actually be cheaper to wrap them in a transaction.

The transaction should spill to disk if it gets too big, though, and statement journals, which are stored as temporary files, so could end up in memory in a container, should not be involved.

So I'm still not sure what's happening here.

mtlynch commented 3 days ago

Okay, thanks! I tried again with 0.18.3 and unfortunately am seeing the same crash when I deploy to Fly / constrained Docker.

https://github.com/mtlynch/picoshare/tree/120278e89829335e8b95de4c8eea4eac5ef665d6

As I said, I was able to insert 100MB blobs into a database with writeblob with as little as 4MB of memory dedicated to each connection, so something else should be the problem (not necessarily PicoShare, but not the blobio extension in isolation).

Yeah, that makes me realize I should create a simpler test case. I'm thinking my next experiment will be:

Create a simple Go program that reads and writes 1000 MB to SQLite via the blobio extension
If that works, wrap it in a simple web app that allows the user to upload arbitrary files and read them back
If that works, figure out what in PicoShare is causing the difference from (2)

ncruces commented 2 days ago

This was my attempt:

package main

import (
    "database/sql"
    "log"
    "os"
    "path/filepath"

    "github.com/ncruces/go-sqlite3"
    _ "github.com/ncruces/go-sqlite3/driver"
    _ "github.com/ncruces/go-sqlite3/embed"
    "github.com/ncruces/go-sqlite3/ext/blobio"
    "github.com/tetratelabs/wazero"
)

func main() {
    sqlite3.RuntimeConfig = wazero.NewRuntimeConfig().WithMemoryLimitPages(64)
    sqlite3.AutoExtension(blobio.Register)

    var err error
    db, err := sql.Open("sqlite3", "file:test.db?_pragma=journal_mode(WAL)")
    if err != nil {
        log.Panic(err)
    }
    defer db.Close()

    _, err = db.Exec(`CREATE TABLE IF NOT EXISTS files (name, data)`)
    if err != nil {
        log.Panic(err)
    }

    for _, path := range os.Args[1:] {
        func() {
            file, err := os.Open(path)
            if err != nil {
                log.Panic(err)
            }
            defer file.Close()

            info, err := file.Stat()
            if err != nil {
                log.Panic(err)
            }
            log.Println("size", info.Size())

            r, err := db.Exec(`INSERT INTO files (name, data) VALUES (?, ?)`,
                filepath.Base(path), sqlite3.ZeroBlob(info.Size()))
            if err != nil {
                log.Panic(err)
            }

            id, err := r.LastInsertId()
            if err != nil {
                log.Panic(err)
            }
            log.Println("id", id)

            _, err = db.Exec(`SELECT writeblob('main', 'files', 'data', ?, 0, ?)`,
                id, sqlite3.Pointer(file))
            if err != nil {
                log.Panic(err)
            }
        }()
    }
}

I'd also suggest testing with modes other than WAL. Not because using WAL is wrong, but because I might be leaking memory with my implementation of shared memory WAL.

mtlynch commented 2 days ago

Cool, thanks! That's interesting. Using your simple app, I'm able to repro the crash even on a system with 24 GB of 32 GB of RAM free.

Here's my branch:

https://github.com/mtlynch/picoshare/tree/ncruces-simple-1

Here's the runner script:

#!/usr/bin/env bash

set -eux

rm test.db* || true

readonly BINARY_DIR='bin'
mkdir -p "${BINARY_DIR}"

readonly BINARY_NAME="${BINARY_DIR}/test-ncruces"

go build \
  -tags netgo \
  -o "${BINARY_NAME}" \
  main.go

TEST_FILE="$(mktemp)"
readonly TEST_FILE
dd if=/dev/urandom of="${TEST_FILE}" bs=1M count=512

du -h "${TEST_FILE}"

"./${BINARY_NAME}" "${TEST_FILE}"

And here's the crash:

+ du -h /tmp/nix-shell.qzdm97/tmp.zhWzQsLW3f
513M    /tmp/nix-shell.qzdm97/tmp.zhWzQsLW3f
+ ./bin/test-ncruces /tmp/nix-shell.qzdm97/tmp.zhWzQsLW3f
2024/09/14 08:57:28 starting up!
2024/09/14 08:57:28 size 536870912
2024/09/14 08:57:30 id 1
panic: sqlite3: out of memory (recovered by wazero)
wasm stack trace:
        env.go_func(i32,i32,i32,i32)
        sqlite3.wasm.go_func_wrapper(i32,i32,i32)
        sqlite3.wasm.sqlite3VdbeExec(i32) i32
        sqlite3.wasm.sqlite3_step(i32) i32

goroutine 1 [running]:
github.com/ncruces/go-sqlite3.(*sqlite).call(0xc000152a88, {0x68af1e, 0xc?}, {0xc002c67768?, 0x6e3dd0?, 0xc002c676a8?})
        /home/mike/code/go/pkg/mod/github.com/ncruces/go-sqlite3@v0.18.3/sqlite.go:181 +0x1ad
github.com/ncruces/go-sqlite3.(*Stmt).Step(0xc0018fb0b0)
        /home/mike/code/go/pkg/mod/github.com/ncruces/go-sqlite3@v0.18.3/stmt.go:109 +0x5f
github.com/ncruces/go-sqlite3.(*Stmt).Exec(0xc0018fb0b0)
        /home/mike/code/go/pkg/mod/github.com/ncruces/go-sqlite3@v0.18.3/stmt.go:133 +0x1f
github.com/ncruces/go-sqlite3/driver.(*stmt).ExecContext(0xc0018fb0e0, {0x6e3dd0, 0xa14c00}, {0xc0000a4190?, 0x0?, 0xc002c67820?})
        /home/mike/code/go/pkg/mod/github.com/ncruces/go-sqlite3@v0.18.3/driver/driver.go:468 +0xd8
database/sql.ctxDriverStmtExec({0x6e3dd0, 0xa14c00}, {0x6e3e78, 0xc0018fb0e0}, {0xc0000a4190, 0x2, 0x2?})
        /nix/store/vz8d6wmfcf38l3h3vymwqr6c5zxp5jmp-go-1.22.3/share/go/src/database/sql/ctxutil.go:65 +0xa3
database/sql.resultFromStatement({0x6e3dd0, 0xa14c00}, {0x6e3ab8, 0xc0001780a0}, 0xc002c67a08, {0xc002c67d90, 0x2, 0x2})
        /nix/store/vz8d6wmfcf38l3h3vymwqr6c5zxp5jmp-go-1.22.3/share/go/src/database/sql/sql.go:2670 +0x13a
database/sql.(*DB).execDC(0x9b5501?, {0x6e3dd0, 0xa14c00}, 0xc000180000, 0x0?, {0x698e79, 0x32}, {0xc002c67d90, 0x2, 0x2})
        /nix/store/vz8d6wmfcf38l3h3vymwqr6c5zxp5jmp-go-1.22.3/share/go/src/database/sql/sql.go:1722 +0x42c
database/sql.(*DB).exec(0xc000124c30, {0x6e3dd0, 0xa14c00}, {0x698e79, 0x32}, {0xc002c67d90, 0x2, 0x2}, 0xd8?)
        /nix/store/vz8d6wmfcf38l3h3vymwqr6c5zxp5jmp-go-1.22.3/share/go/src/database/sql/sql.go:1683 +0xd5
database/sql.(*DB).ExecContext.func1(0xc?)
        /nix/store/vz8d6wmfcf38l3h3vymwqr6c5zxp5jmp-go-1.22.3/share/go/src/database/sql/sql.go:1662 +0x4f
database/sql.(*DB).retry(0x4c8f40?, 0xc002c67ca8)
        /nix/store/vz8d6wmfcf38l3h3vymwqr6c5zxp5jmp-go-1.22.3/share/go/src/database/sql/sql.go:1566 +0x42
database/sql.(*DB).ExecContext(0xc00011a150?, {0x6e3dd0?, 0xa14c00?}, {0x698e79?, 0xc000124c30?}, {0xc002c67d90?, 0xa14c00?, 0x697d10?})
        /nix/store/vz8d6wmfcf38l3h3vymwqr6c5zxp5jmp-go-1.22.3/share/go/src/database/sql/sql.go:1661 +0xc8
database/sql.(*DB).Exec(...)
        /nix/store/vz8d6wmfcf38l3h3vymwqr6c5zxp5jmp-go-1.22.3/share/go/src/database/sql/sql.go:1675
main.main.func1({0x7fffffff6595, 0x24}, 0xc000124c30)
        /home/mike/code/picoshare/main.go:60 +0x49f
main.main()
        /home/mike/code/picoshare/main.go:65 +0x285

I tried removing the WAL option from your main.go but I still see the same crash.

mtlynch commented 2 days ago

A bit of progress. It turns out that the OOM was from this line:

sqlite3.RuntimeConfig = wazero.NewRuntimeConfig().WithMemoryLimitPages(64)

I bumped it to 512, the OOM went away, even when I used Docker to limit RAM to 128M:

https://github.com/mtlynch/picoshare/tree/ncruces-simple-2

See: https://github.com/mtlynch/picoshare/blob/ncruces-simple-2/scripts/test-under-docker

The strange thing is that any pragma that Litestream recommends causes a crash, even though none of the pragmas seem like they'd bloat memory:

PRAGMA busy_timeout = 5000;
PRAGMA synchronous = NORMAL;
PRAGMA wal_autocheckpoint = 0;

I've tried applying only one pragma at a time, and the test program always crashes, and I've tried with none of them enabled and the test program always succeeds with a \~1 GB file and 128M of RAM. I thought it was just luck, but I've been able to reproduce it 10x with the pragmas all disabled and with them individually enabled.

Do you have any ideas why that could be happening?

The other issue I ran into was that if I tried a file larger than 1 GB, it failed with SQLITE_TOOBIG. Is it possible to increase the limit through go-sqlite3 and the database/sql interface? I found the Limit API, but it looks like that's not available through the database/sql interface.

mtlynch commented 2 days ago

Interestingly setting PRAGMA wal_autocheckpoint = 0 alone causes OOM crash even when I run the application outside of Docker with 24 GB of RAM available:

https://github.com/mtlynch/picoshare/blob/ncruces-simple-3/scripts/test

~/code/picoshare (ncruces-simple-3)$ ./scripts/test
+ rm test.db test.db-shm test.db-wal
++ ./scripts/build
+ readonly BINARY_DIR=bin
+ BINARY_DIR=bin
+ mkdir -p bin
+ readonly BINARY_NAME=bin/test-ncruces
+ BINARY_NAME=bin/test-ncruces
+ go build -tags netgo -o bin/test-ncruces main.go
+ echo bin/test-ncruces
+ BINARY_NAME=bin/test-ncruces
+ go build -tags netgo -o bin/test-ncruces main.go
++ ./scripts/make-data-file
++ mktemp
+ TEST_FILE=/tmp/nix-shell.qzdm97/tmp.s6fSehu6Ao
+ readonly TEST_FILE
+ dd if=/dev/urandom of=/tmp/nix-shell.qzdm97/tmp.s6fSehu6Ao bs=1M count=952
952+0 records in
952+0 records out
998244352 bytes (998 MB, 952 MiB) copied, 2.40168 s, 416 MB/s
+ echo /tmp/nix-shell.qzdm97/tmp.s6fSehu6Ao
+ TEST_FILE=/tmp/nix-shell.qzdm97/tmp.s6fSehu6Ao
+ readonly TEST_FILE
+ ./bin/test-ncruces /tmp/nix-shell.qzdm97/tmp.s6fSehu6Ao
2024/09/14 11:03:38 starting up!
2024/09/14 11:03:39 size 998244352
2024/09/14 11:03:40 id 1
panic: sqlite3: out of memory (recovered by wazero)
wasm stack trace:
        env.go_shm_map(i32,i32,i32,i32,i32) i32
        sqlite3.wasm.walIndexPageRealloc(i32,i32,i32) i32
        sqlite3.wasm.walIndexAppend(i32,i32,i32) i32
        sqlite3.wasm.pagerWalFrames(i32,i32,i32,i32) i32
        sqlite3.wasm.pagerStress(i32,i32) i32
        sqlite3.wasm.getPageNormal(i32,i32,i32,i32) i32
        sqlite3.wasm.accessPayload(i32,i32,i32,i32,i32) i32
        sqlite3.wasm.sqlite3BtreePutData(i32,i32,i32,i32) i32
        sqlite3.wasm.blobReadWrite(i32,i32,i32,i32,i32) i32
        sqlite3.wasm.sqlite3_blob_write(i32,i32,i32,i32) i32 (recovered by wazero)
wasm stack trace:
        env.go_func(i32,i32,i32,i32)
        sqlite3.wasm.go_func_wrapper(i32,i32,i32)
        sqlite3.wasm.sqlite3VdbeExec(i32) i32
        sqlite3.wasm.sqlite3_step(i32) i32

goroutine 1 [running]:
github.com/ncruces/go-sqlite3.(*sqlite).call(0xc002ed2a88, {0x68af1e, 0xc?}, {0xc002c7d728?, 0x6e3f10?, 0xc002c7d668?})
        /home/mike/code/go/pkg/mod/github.com/ncruces/go-sqlite3@v0.18.3/sqlite.go:181 +0x1ad
github.com/ncruces/go-sqlite3.(*Stmt).Step(0xc003712000)
        /home/mike/code/go/pkg/mod/github.com/ncruces/go-sqlite3@v0.18.3/stmt.go:109 +0x5f
github.com/ncruces/go-sqlite3.(*Stmt).Exec(0xc003712000)
        /home/mike/code/go/pkg/mod/github.com/ncruces/go-sqlite3@v0.18.3/stmt.go:133 +0x1f
github.com/ncruces/go-sqlite3/driver.(*stmt).ExecContext(0xc003712030, {0x6e3f10, 0xa14c00}, {0xc003710050?, 0x0?, 0xc002c7d7e0?})
        /home/mike/code/go/pkg/mod/github.com/ncruces/go-sqlite3@v0.18.3/driver/driver.go:468 +0xd8
database/sql.ctxDriverStmtExec({0x6e3f10, 0xa14c00}, {0x6e3fb8, 0xc003712030}, {0xc003710050, 0x2, 0x2?})
        /nix/store/vz8d6wmfcf38l3h3vymwqr6c5zxp5jmp-go-1.22.3/share/go/src/database/sql/ctxutil.go:65 +0xa3
database/sql.resultFromStatement({0x6e3f10, 0xa14c00}, {0x6e3bf8, 0xc0000a40f0}, 0xc002c7d9c8, {0xc002c7dd60, 0x2, 0x2})
        /nix/store/vz8d6wmfcf38l3h3vymwqr6c5zxp5jmp-go-1.22.3/share/go/src/database/sql/sql.go:2670 +0x13a
database/sql.(*DB).execDC(0x9b5501?, {0x6e3f10, 0xa14c00}, 0xc0000ee000, 0x0?, {0x698e87, 0x32}, {0xc002c7dd60, 0x2, 0x2})
        /nix/store/vz8d6wmfcf38l3h3vymwqr6c5zxp5jmp-go-1.22.3/share/go/src/database/sql/sql.go:1722 +0x42c
database/sql.(*DB).exec(0xc0000b09c0, {0x6e3f10, 0xa14c00}, {0x698e87, 0x32}, {0xc002c7dd60, 0x2, 0x2}, 0x78?)
        /nix/store/vz8d6wmfcf38l3h3vymwqr6c5zxp5jmp-go-1.22.3/share/go/src/database/sql/sql.go:1683 +0xd5
database/sql.(*DB).ExecContext.func1(0xc?)
        /nix/store/vz8d6wmfcf38l3h3vymwqr6c5zxp5jmp-go-1.22.3/share/go/src/database/sql/sql.go:1662 +0x4f
database/sql.(*DB).retry(0x4c8f40?, 0xc002c7dc68)
        /nix/store/vz8d6wmfcf38l3h3vymwqr6c5zxp5jmp-go-1.22.3/share/go/src/database/sql/sql.go:1566 +0x42
database/sql.(*DB).ExecContext(0xc0000a6150?, {0x6e3f10?, 0xa14c00?}, {0x698e87?, 0xc0000ee000?}, {0xc002c7dd60?, 0x69b71b?, 0xf5?})
        /nix/store/vz8d6wmfcf38l3h3vymwqr6c5zxp5jmp-go-1.22.3/share/go/src/database/sql/sql.go:1661 +0xc8
database/sql.(*DB).Exec(...)
        /nix/store/vz8d6wmfcf38l3h3vymwqr6c5zxp5jmp-go-1.22.3/share/go/src/database/sql/sql.go:1675
main.main.func1({0x7fffffff6595, 0x24}, 0xc0000b09c0)
        /home/mike/code/picoshare/main.go:69 +0x49f
main.main()
        /home/mike/code/picoshare/main.go:76 +0x2e5

ncruces commented 2 days ago

Yeah, I was about to say that. This is related to the shared memory index used by WAL. There might be a bug on my end (though looking at the code, it's not apparent, I'll look into it more).

So, the way this works is:

In WAL mode, a new transaction appends pages to the WAL.
Because they're simply appended, SQLite needs an index to quickly find page N in the WAL.
The size of this index is proportional to the number of changed pages in the WAL.
Because you're doing inserts, lots of pages are added (not updated) so the index grows.
This gets cleared when the WAL is check pointed, and pages are copied to the main database file.
But you've have disabled check points (litestream is supposed to handle them, but you're probably not running litestream).
So the WAL, and the index, grows without bound.
Compounding this, my driver loads one copy of the index per connection, instead of one copy for the entire process.

One thing that I've noticed improves the situation is to wrap the INSERT and SELECT writeblob into a transaction. The INSERT creates zeroed pages in the WAL, the SELECT updates them in place. So, half the size written to the WAL and indexed.

Still, I saw the WAL indexes using 256Kb × [number of connections] for 100MB files, so it shouldn't be that bad.

I really need to test your code under Linux to see if this is an OS issue.

ncruces commented 2 days ago

Right, so a transaction roughly cuts the size of the index needed in half. Still, the index for your sample 512MB file, is around 2MB. I'm sure there's some space lost to fragmentation, but it's 2MB.

Never checkpointing the WAL though, makes this grow forever. So if you're doing PRAGMA wal_autocheckpoint = 0, you need to run Litestream. Not because of one insert, but definitely in a running service.

The other issue I ran into was that if I tried a file larger than 1 GB, it failed with SQLITE_TOOBIG. Is it possible to increase the limit through go-sqlite3 and the database/sql interface? I found the Limit API, but it looks like that's not available through the database/sql interface.

I had missed this. The maximum size of a BLOB is 1,000,000,000. This can be raised at compile time to 2GB (adding SQLITE_MAX_LENGTH to sqlite_opt.h); you'd need to build a custom sqlite3.wasm, tough. The size can only be lowered at runtime, not raised. I could raise the default limit (although I'm weary of changing defaults), but I'm guessing for really large files, you might need to split them anyway. I'm sorry.

About APIs that are only available through the lower level API. If it's something you want to change for every new connection:

sqlite3.AutoExtension(func(c *sqlite3.Conn) error {
    // this runs whenever any database connection is opened
    return nil
})

Or:

db, err := driver.Open("file:test.db",
    func(c *sqlite3.Conn) error {
        // this runs whenever a connection to this database is opened
        return nil
    },
    func(c *sqlite3.Conn) error {
        // this runs before a connection to this database is closed
        return nil
    }
)

If it's something you want to do in a specific connection, at a specific moment:

conn, err := db.Conn(context.TODO())
if err != nil {
    log.Fatal(err)
}

err = conn.Raw(func(driverConn any) error {
    var c *sqlite3.Conn = driverConn.(driver.Conn).Raw()
    // we have a low level Conn, lets Backup
    return c.Backup("main", "backup.db")
})
if err != nil {
    log.Fatal(err)
}

ncruces commented 16 hours ago

So I finally ran this on Linux as well.

The WAL index takes around 2MB per 1GB of WAL file. So with a 1GB WAL, SQLite seems to manage fine with just 8MB: 2MB for the default page cache, 2MB for the WAL index, 1MB for the write buffer I'm allocating, the other 3MB are "overhead" (stack space, global variables, fragmentation, etc); seams reasonable, can't really fault it.

So the problem here seems to be:

huge transactions,
disabling auto checkpoints, and/or
Litestream not being timely about checkpointing the database.

When you put those 3 things together, yeah SQLite is going to consume unbounded memory (and my driver adds overhead by siloing each connection). [^1]

But WithMemoryLimitPages(256) (16MB per connection) should be OK for most apps running in low memory settings. I was able to write a 998MB BLOB with half as much as that.

Other than that I can only stress that if you disable auto checkpoints to make Litestream happy, you must run Litestream, and maybe tweak it to monitor the database more often: https://github.com/benbjohnson/litestream/issues/100. Wrapping each INSERT and SELECT writeblob pair in a transaction will also help (Litestream won't monitor in the middle of those and issue a checkpoint, like SQLite would).

[^1]: Little known fact: WAL is way more inefficient than rollback for writing this much new data to a database. But that's irrelevant since people want to use Litestream.

ncruces commented 15 hours ago

For BLOBs larger than SQLITE_MAX_LENGTH, I guess a CRUD library that handles chunking, etc, would be valuable.

I'm not sure I'm up to the task though.

mtlynch commented 15 hours ago

Okay, thanks for digging so deeply into this!

I'll experiment more with my sample program and PicoShare and see if I can get this working by breaking the BLOBs into chunks and using transactions.

Other than that I can only stress that if you disable auto checkpoints to make Litestream happy, you must run Litestream, and maybe tweak it to monitor the database more often: https://github.com/benbjohnson/litestream/issues/100. Wrapping each INSERT and SELECT writeblob pair in a transaction will also help (Litestream won't monitor in the middle of those and issue a checkpoint, like SQLite would).

That should be okay for PicoShare. Because it's not like an app where lots of users are interacting simultaneously and you can lose important data if you throw away the last minute of database writes. In the case of PicoShare, the worst that's probably going to happen is that you lose the last file you uploaded, which probably happened anyway if PicoShare crashed. Or maybe you lose a minute of analytics, but that's not a big deal either.

For BLOBs larger than SQLITE_MAX_LENGTH, I guess a CRUD library that handles chunking, etc, would be valuable.

I'm not sure I'm up to the task though.

Okay, that's fine. I've already implemented this for the mattn driver, so it should be pretty straightforward to modify it for your driver.

ncruces commented 12 hours ago

That should be okay for PicoShare. Because it's not like an app where lots of users are interacting simultaneously and you can lose important data if you throw away the last minute of database writes.

What I meant was, if auto checkpointing is disabled, with two transactions (one to reserve space for the BLOB, the other to write bytes to it) the WAL grows faster, and the index needs more memory.

Because Litestream will not check at the "right" time, and checkpoint the WAL.

With auto checkpoints, if the BLOB is big enough, SQLite decides the WAL grew too much and checkpoints between the transactions.

I'll close this for now as the discussion long outlived the original question.

I still fixed an important bug, improved wazero, did some BLOB improvements, and gained a contributor, so thanks!

Feel free to open more issues.