Open eestrada opened 1 year ago
Functions where I see this being useful:
add
status
commit
(when automatically adding dirty files; however, that should simply call into the pre-existing add
code if possible)checkout
(multiple parallel blob reads from the DB, multiple parallel file writes to disk, multiple parallel file/directory unlinkings from disk)
See go-sqlite3 FAQ documentation regarding concurrency:
https://github.com/mattn/go-sqlite3#faq
See question "Can I use this in multiple routines concurrently?"
For now I'm assuming that https://modernc.org/sqlite has similar behavior regarding concurrency.
With config/CLI flag, specifying the number of goroutines spawned for file/blob hashing/compression/decompression should be determined like so:
var >= 1
is considered the exact number of goroutines to use. A var value of0
means unlimited goroutines up to the number of files/blobs to be processed; this is not recommended since memory usage is effectively unbounded.var < 0
means multiply the absolute value of the variable against the number of CPU cores on the currently running system. For example,-1
will spawn as many goroutines as CPU cores,-2
will spawn twice as many goroutines as CPU cores,-3
will spawn thrice as many goroutines as CPU cores, and so on.The default value should be
1
to constrain memory usage to be as low as possible.All writing to the sqlite DB(s) should be on a single goroutine to avoid lock contention, but more importantly, to ensure all operations happen within the same DB transaction. Alternativiely, maybe it is not such a big deal so long as individual files are completelly written per transaction, that way on partial failure, we don't need to redo any work that was previously committed on the next
add
/whatever operation.For reference: