metamath / metamath-knife

Metamath-knife can rapidly verify Metamath proofs, providing strong confidence that the proofs are correct.
Apache License 2.0
25 stars 10 forks source link

High-level database API #24

Closed tirix closed 2 years ago

tirix commented 2 years ago

The Database structure has the following high level API:

The new database API I've added, outline_result, grammar_result and parse_stmt_result, all follow the same model, i.e. they also are lazy functions.

The problem with lazy functions is that this forces carrying around references to a mutable Database structure, since the results may have to be created in the function call... and there can only be one such reference to a mutable object. When using the library, I would prefer to have a non-mutable references, which allows to keep several references at the same time.

On the other hand, the idea of imposing mutable access might have been chosen in order to better cover the case of an incremental database.

@sorear @david-a-wheeler @digama0 What's your opinion?

digama0 commented 2 years ago

You can just have another version of the functions that takes a shared reference and panics if the value is not already computed. Naming here seems hard, but how about get_name_result for the lazy/mutating version and name_result for the cached version?

tirix commented 2 years ago

Actually, about the API, right now everything is accessible through a set of objects, one for each facet of the database (segments, naming, scopes, grammar, outline, etc.)

Wouldn't it be nicer from a user point of view if Database could provide all these directly? It could lazily evaluate and delegate where needed.

I perfectly see the reason to split those from an implementation and parallelization point of view. But do you see any reason to split these, from an API point of view?

digama0 commented 2 years ago

I find the current API quite confusing, actually. There are a lot of constructs that don't make much sense from a naive perspective of what one would find in a verifier, or a MM database. I'm sure that all this segment stuff is important for efficiency but the API should make it easier to forget about that complexity when it isn't important, possibly by providing iterator types or other "view" types that hide those details unless the user is interested in them.

I agree that it would be better to just have a bunch of methods on Database where possible. There are some reasons to keep things separate, possibly including making some fields public, in order to leverage rust's ability to allow mutation of one field while simultaneously holding a borrow to another field, but I don't know how much of that sort of thing is really necessary to do from outside the library.

tirix commented 2 years ago

Then I'd propose an API along these lines, on Database:

Another source of confusion I see is the number of different ways there are currently to address a statement:

tirix commented 2 years ago

I discovered your issue #35 after posting here, I think we agree on several points.

digama0 commented 2 years ago

As I mentioned in issue #35, the types argument in run_pass can be a struct of bools instead (which also allows for a builder API). Beyond that, it can't return a borrowed slice of diagnostics because it is stored in five separate pieces internally. While a slice for the individual pass diagnostics is a possibility, a more agnostic API would be:

The idea here is that the user receives the diagnostics as soon as they become available, and can copy them into a vec and/or turn them into a Notation if they want; the Result is so that the user callback can signal an error and abort early.

  • fn get_outline(&self) -> Option<Arc<OutlineNode>> would return the outline of the database (chapters, sections)

This could be an iterator instead of allocating the whole tree structure up front. That's a bit complicated though, so I might try doing that myself.

tirix commented 2 years ago

the types argument in run_pass can be a struct of bools instead

Could we even add it to the DbOptions? [Edit] This would mean that everything is set at creation, and after reflexion I think it would still be better to allow running the passes incrementally. This would e.g. speed up the loading of an application, allowing it to e.g. first quickly display partial information about the database.

This could be an iterator instead of allocating the whole tree structure up front

Ideally we would do that with LSP's DocumentSymbol in mind.

digama0 commented 2 years ago

the types argument in run_pass can be a struct of bools instead

Could we even add it to the DbOptions? [Edit] This would mean that everything is set at creation, and after reflexion I think it would still be better to allow running the passes incrementally. This would e.g. speed up the loading of an application, allowing it to e.g. first quickly display partial information about the database.

Why not both? I imagine that calling several passes is likely to be a common step after initialization, but we can also have a function to run more passes after the fact or do individual passes a la carte.