High-level database API

tirix commented 2 years ago

The Database structure has the following high level API:

parse is an action which triggers the Metamath level parsing of the database.
parse_result returns the result of the parsing step. This is not a lazy function, it assumes parse has been called previously, and panics otherwise.
name_result returns the lookup tables. This is a lazy function, it returns the tables if they were already created before, otherwise creates them.
scope_result returns the database scopes. This is also a lazy function.
verify_result returns the result of the verify step. This is also a lazy function.

The new database API I've added, outline_result, grammar_result and parse_stmt_result, all follow the same model, i.e. they also are lazy functions.

The problem with lazy functions is that this forces carrying around references to a mutable Database structure, since the results may have to be created in the function call... and there can only be one such reference to a mutable object. When using the library, I would prefer to have a non-mutable references, which allows to keep several references at the same time.

On the other hand, the idea of imposing mutable access might have been chosen in order to better cover the case of an incremental database.

@sorear @david-a-wheeler @digama0 What's your opinion?

digama0 commented 2 years ago

You can just have another version of the functions that takes a shared reference and panics if the value is not already computed. Naming here seems hard, but how about get_name_result for the lazy/mutating version and name_result for the cached version?

tirix commented 2 years ago

Actually, about the API, right now everything is accessible through a set of objects, one for each facet of the database (segments, naming, scopes, grammar, outline, etc.)

Wouldn't it be nicer from a user point of view if Database could provide all these directly? It could lazily evaluate and delegate where needed.

I perfectly see the reason to split those from an implementation and parallelization point of view. But do you see any reason to split these, from an API point of view?

digama0 commented 2 years ago

I find the current API quite confusing, actually. There are a lot of constructs that don't make much sense from a naive perspective of what one would find in a verifier, or a MM database. I'm sure that all this segment stuff is important for efficiency but the API should make it easier to forget about that complexity when it isn't important, possibly by providing iterator types or other "view" types that hide those details unless the user is interested in them.

I agree that it would be better to just have a bunch of methods on Database where possible. There are some reasons to keep things separate, possibly including making some fields public, in order to leverage rust's ability to allow mutation of one field while simultaneously holding a borrow to another field, but I don't know how much of that sort of thing is really necessary to do from outside the library.

tirix commented 2 years ago

Then I'd propose an API along these lines, on Database:

fn run_pass(&mut self, types: &[DiagnosticClass]) -> &[Diagnostic] would perform the Db verification and return the diagnostics. The class indicates which passes to perform (parse, scope, verify, grammar, stmt_parse, verify_parse_stmt, outline). This would be the only mutable API. Others would fail (panic, or return empty results) if this has not been called first.
fn to_annotations(&self, diags: [&Diagnostic]) -> &[Notation] would convert the lightweight Diagnostic into printable Notation for printing,
fn statement(&mut self, name: &str) -> Option<StatementAddress> would return a statement address given a string label (see more below for lookups).
fn get_formula(&self, stmt: &StatementAddress) -> Option<&Formula> would provide the formula for a given statement.
fn parse_formula(&self, symbol_iter: &mut dyn Iterator<Item = Symbol>, expected_typecodes: Box<[TypeCode]>) -> Result<Formula, Diagnostic>
fn get_frame(&self, stmt: &StatementAddress) -> Option<&Frame> would return the frame for a given statement (includes hypotheses, DV, etc.)
fn get_outline(&self) -> Option<Arc<OutlineNode>> would return the outline of the database (chapters, sections)
fn export_mmp<W: Write>(stmt: StatementAddress, out: &mut W) -> Result<(), ExportError> would export the proof for the given statement to an MMP format

Another source of confusion I see is the number of different ways there are currently to address a statement:

[u8] representing the statement's label,
StatementRef
StatementAddress
Label (Atom) Database shall provide ways to convert between each of them, which would make StatementRef unnecessary outside of the library.

tirix commented 2 years ago

I discovered your issue #35 after posting here, I think we agree on several points.

digama0 commented 2 years ago

As I mentioned in issue #35, the types argument in run_pass can be a struct of bools instead (which also allows for a builder API). Beyond that, it can't return a borrowed slice of diagnostics because it is stored in five separate pieces internally. While a slice for the individual pass diagnostics is a possibility, a more agnostic API would be:

fn run_pass<E>(&mut self, passes: Passes, on_diag: impl FnMut(&Self, &Diagnostic) -> Result<(), E>) -> Result<(), E>;

The idea here is that the user receives the diagnostics as soon as they become available, and can copy them into a vec and/or turn them into a Notation if they want; the Result is so that the user callback can signal an error and abort early.

fn get_outline(&self) -> Option<Arc<OutlineNode>> would return the outline of the database (chapters, sections)

This could be an iterator instead of allocating the whole tree structure up front. That's a bit complicated though, so I might try doing that myself.

tirix commented 2 years ago

the types argument in run_pass can be a struct of bools instead

~~Could we even add it to the DbOptions?~~ [Edit] This would mean that everything is set at creation, and after reflexion I think it would still be better to allow running the passes incrementally. This would e.g. speed up the loading of an application, allowing it to e.g. first quickly display partial information about the database.

This could be an iterator instead of allocating the whole tree structure up front

Ideally we would do that with LSP's DocumentSymbol in mind.

digama0 commented 2 years ago

the types argument in run_pass can be a struct of bools instead

~~Could we even add it to the DbOptions?~~ [Edit] This would mean that everything is set at creation, and after reflexion I think it would still be better to allow running the passes incrementally. This would e.g. speed up the loading of an application, allowing it to e.g. first quickly display partial information about the database.

Why not both? I imagine that calling several passes is likely to be a common step after initialization, but we can also have a function to run more passes after the fact or do individual passes a la carte.

metamath / metamath-knife

High-level database API #24