Closed tirix closed 2 years ago
You can just have another version of the functions that takes a shared reference and panics if the value is not already computed. Naming here seems hard, but how about get_name_result
for the lazy/mutating version and name_result
for the cached version?
Actually, about the API, right now everything is accessible through a set of objects, one for each facet of the database (segments, naming, scopes, grammar, outline, etc.)
Wouldn't it be nicer from a user point of view if Database
could provide all these directly? It could lazily evaluate and delegate where needed.
I perfectly see the reason to split those from an implementation and parallelization point of view. But do you see any reason to split these, from an API point of view?
I find the current API quite confusing, actually. There are a lot of constructs that don't make much sense from a naive perspective of what one would find in a verifier, or a MM database. I'm sure that all this segment stuff is important for efficiency but the API should make it easier to forget about that complexity when it isn't important, possibly by providing iterator types or other "view" types that hide those details unless the user is interested in them.
I agree that it would be better to just have a bunch of methods on Database
where possible. There are some reasons to keep things separate, possibly including making some fields public, in order to leverage rust's ability to allow mutation of one field while simultaneously holding a borrow to another field, but I don't know how much of that sort of thing is really necessary to do from outside the library.
Then I'd propose an API along these lines, on Database
:
fn run_pass(&mut self, types: &[DiagnosticClass]) -> &[Diagnostic]
would perform the Db verification and return the diagnostics. The class indicates which passes to perform (parse, scope, verify, grammar, stmt_parse, verify_parse_stmt, outline). This would be the only mutable API. Others would fail (panic, or return empty results) if this has not been called first.fn to_annotations(&self, diags: [&Diagnostic]) -> &[Notation]
would convert the lightweight Diagnostic into printable Notation for printing,fn statement(&mut self, name: &str) -> Option<StatementAddress>
would return a statement address given a string label (see more below for lookups).fn get_formula(&self, stmt: &StatementAddress) -> Option<&Formula>
would provide the formula for a given statement.fn parse_formula(&self, symbol_iter: &mut dyn Iterator<Item = Symbol>, expected_typecodes: Box<[TypeCode]>) -> Result<Formula, Diagnostic>
fn get_frame(&self, stmt: &StatementAddress) -> Option<&Frame>
would return the frame for a given statement (includes hypotheses, DV, etc.)fn get_outline(&self) -> Option<Arc<OutlineNode>>
would return the outline of the database (chapters, sections)fn export_mmp<W: Write>(stmt: StatementAddress, out: &mut W) -> Result<(), ExportError>
would export the proof for the given statement to an MMP formatAnother source of confusion I see is the number of different ways there are currently to address a statement:
[u8]
representing the statement's label,StatementRef
StatementAddress
Label
(Atom)
Database shall provide ways to convert between each of them, which would make StatementRef
unnecessary outside of the library.I discovered your issue #35 after posting here, I think we agree on several points.
As I mentioned in issue #35, the types
argument in run_pass
can be a struct of bools instead (which also allows for a builder API). Beyond that, it can't return a borrowed slice of diagnostics because it is stored in five separate pieces internally. While a slice for the individual pass diagnostics is a possibility, a more agnostic API would be:
fn run_pass<E>(&mut self, passes: Passes, on_diag: impl FnMut(&Self, &Diagnostic) -> Result<(), E>) -> Result<(), E>;
The idea here is that the user receives the diagnostics as soon as they become available, and can copy them into a vec and/or turn them into a Notation
if they want; the Result
is so that the user callback can signal an error and abort early.
fn get_outline(&self) -> Option<Arc<OutlineNode>>
would return the outline of the database (chapters, sections)
This could be an iterator instead of allocating the whole tree structure up front. That's a bit complicated though, so I might try doing that myself.
the
types
argument inrun_pass
can be a struct of bools instead
Could we even add it to the
[Edit] This would mean that everything is set at creation, and after reflexion I think it would still be better to allow running the passes incrementally. This would e.g. speed up the loading of an application, allowing it to e.g. first quickly display partial information about the database.DbOptions
?
This could be an iterator instead of allocating the whole tree structure up front
Ideally we would do that with LSP's DocumentSymbol
in mind.
the
types
argument inrun_pass
can be a struct of bools instead
Could we even add it to the[Edit] This would mean that everything is set at creation, and after reflexion I think it would still be better to allow running the passes incrementally. This would e.g. speed up the loading of an application, allowing it to e.g. first quickly display partial information about the database.DbOptions
?
Why not both? I imagine that calling several passes is likely to be a common step after initialization, but we can also have a function to run more passes after the fact or do individual passes a la carte.
The
Database
structure has the following high level API:parse
is an action which triggers the Metamath level parsing of the database.parse_result
returns the result of the parsing step. This is not a lazy function, it assumesparse
has been called previously, and panics otherwise.name_result
returns the lookup tables. This is a lazy function, it returns the tables if they were already created before, otherwise creates them.scope_result
returns the database scopes. This is also a lazy function.verify_result
returns the result of the verify step. This is also a lazy function.The new database API I've added,
outline_result
,grammar_result
andparse_stmt_result
, all follow the same model, i.e. they also are lazy functions.The problem with lazy functions is that this forces carrying around references to a mutable
Database
structure, since the results may have to be created in the function call... and there can only be one such reference to a mutable object. When using the library, I would prefer to have a non-mutable references, which allows to keep several references at the same time.On the other hand, the idea of imposing mutable access might have been chosen in order to better cover the case of an incremental database.
@sorear @david-a-wheeler @digama0 What's your opinion?