Open vi opened 1 year ago
Hey @vi! Thanks for bringing this up. I don't know that it's realistic to try and handle all possible complications involving memory mapping, and it's definitely possible that by manipulating the underlying file you can mess up your database and cause a panic or other undefined behavior. However, I do believe that the steps I've taken (locking the file if possible) should be "safe enough" provided you're not doing something ridiculous, which is why I chose to not to make the function unsafe. unsafe
is no "protection" as you've put it, so I don't see the point of marking the function that way. In my opinion it would just cause uncertainty in the library's stability rather than warning users to not do something dumb. I'm happy to debate this point though!
Perhaps some more documentation would be appropriate though? Some big warning that says not to mess with the file while it's open by another process?
unsafe is no "protection" as you've put it, so I don't see the point of marking the function that way.
unsafe
is not a protection, but moving the responsibility from authors of the crate to its users.
I do believe that the steps I've taken (locking the file if possible) should be "safe enough" provided you're not doing something ridiculous, which is why I chose to not to make the function unsafe.
For example, would it be wise to use the app from privileged suid executable that accesses a database which is created by unprivileged users? Will straightforward usage of the API lead to a secure application? Users of the crate may think "if database is adversarial then it may return fake data, so we need e.g. to verify signatures of those data before using it" while in reality mere opening the database leads to code execution.
Perhaps some more documentation would be appropriate though? Some big warning that says not to mess with the file while it's open by another process?
Documentation should be a starting point. The documentation should mention what worst can happen if the rules are not being followed. If this "worst" involves undefined behaviour, then it invites unsafe fn
.
If, however, the function is a gateway to something large and potential unsoundness only reachable in really tricky scenarios, then the function is often not marked unsafe
. For example, accessing debugging tools like /proc/self/mem
from filesystem API or /usr/bin/gdb
from process spawn API, executing Python code (which may involve unsafe ctypes
module) are not typically marked unsafe
.
But, for example, opening attacker-provided or corrupted database file should not be considered as too exotic scenario and should invite unsafe
, unless it is guaranteed that problems would be encapsulated only to the database and data returned by it (i.e. OK ways to misbehave: infinite loop, deadlock, process abort, large memory allocation, returning arbitrary, but well-formed data to users, infinite recursion. Not OK ways to misbehave: misaligned/dangling references, returning non-UTF-8 byte blocks as strings to users; mangling other, neighbouring, unrelated and uncorrupted jammdb::DB
instance).
Precedent: BTreeMap/HashMap in std and bad Ord
/Hash
implementations:
It is a logic error for a key to be modified ... The behavior resulting from such a logic error is not specified, but will be encapsulated to the BTreeMap that observed the logic error and not result in undefined behavior. This could include panics, incorrect results, aborts, memory leaks, and non-termination.
This encapsulation allows BTreeMap
to stay sound even when misbehaving. Can the same be stated for jammdb::DB
?
(locking the file if possible) if possible
There may be two functions for opening the database: usual pub fn open
which enforces proper mandatory locking, overridable only with root access or debuggers and failing to open the database if it is not possible; and pub unsafe fn open_unchecked
which skips locking or makes it optional.
Description mentions that the library uses
mmap
, also links to "Single-level store" article, which suggests that jammdb may give users references leading to mmaped files.I expected e.g.
jammdb::DB::open
to beunsafe fn
, as full protection against unreasonable things that is needed to make it completely sound (or maybe even enough-for-practical-considerations sound) may not be viable gived the architecture.But it seems to be not the case, the crate API does not hint at potentials undefined behaviours users may face with the library. For example, what worst could happen if the database file is mangled by external process while in use? What if multiple processes open the same database for writing (including using networked filesystem without locking)?
Shall the entry function be marked
unsafe
to make users commit to not doing unreasonable things to the database? Or isjammdb
actually fully handles all possible complications of memory mapping, so the API is indeed sound even when misused?