hvrt-vcs / hvrt

Havarti is a Hybrid VCS that works just as well distributed as it does centralized.
BSD Zero Clause License
2 stars 0 forks source link

Use multiple goroutines for hashing/compression/decompression of blobs/files #1

Open eestrada opened 1 year ago

eestrada commented 1 year ago

With config/CLI flag, specifying the number of goroutines spawned for file/blob hashing/compression/decompression should be determined like so: var >= 1 is considered the exact number of goroutines to use. A var value of 0 means unlimited goroutines up to the number of files/blobs to be processed; this is not recommended since memory usage is effectively unbounded. var < 0 means multiply the absolute value of the variable against the number of CPU cores on the currently running system. For example, -1 will spawn as many goroutines as CPU cores, -2 will spawn twice as many goroutines as CPU cores, -3 will spawn thrice as many goroutines as CPU cores, and so on.

The default value should be 1 to constrain memory usage to be as low as possible.

All writing to the sqlite DB(s) should be on a single goroutine to avoid lock contention, but more importantly, to ensure all operations happen within the same DB transaction. Alternativiely, maybe it is not such a big deal so long as individual files are completelly written per transaction, that way on partial failure, we don't need to redo any work that was previously committed on the next add/whatever operation.

For reference:

eestrada commented 1 year ago

Functions where I see this being useful:

eestrada commented 1 year ago

See go-sqlite3 FAQ documentation regarding concurrency:

https://github.com/mattn/go-sqlite3#faq

See question "Can I use this in multiple routines concurrently?"

For now I'm assuming that https://modernc.org/sqlite has similar behavior regarding concurrency.