obi1kenobi / trustfall

A query engine for any combination of data sources. Query your files and APIs as if they were databases!
Apache License 2.0
2.3k stars 66 forks source link

add help to unreachable message #542

Closed fprasx closed 5 months ago

fprasx commented 5 months ago

Should this actually be a new error variant?

obi1kenobi commented 5 months ago

Nice catch! If that code is reachable, then yeah let's make it return an error rather than panicking. If there's no suitable error variant already (one whose error message matches what what went wrong here), then feel free to create a new one.

fprasx commented 5 months ago

Sounds good #frictionlog

EDIT: changed to enum variant

obi1kenobi commented 5 months ago

Looks good! Could you add a test for it as well?

Just drop a schema file (.graphql extension) that has this issue (and is otherwise valid) in test_data/tests/schema_errors, and the test automation should pick it up. You'll then need to add a corresponding .schema-error.ron file that contains the error, or you can create it based on what the test automation suggests is needed (after verifying it for correctness).

fprasx commented 5 months ago

Running into a little issue, which is that if a type of a field on the root query type is unknown, the error is emitted twice, once in check_root_query_type_invariants, and once in check_type_and_property_and_edge_invariants.

(yes testing reviewed a new place this error can happen 😅)

Do you think we should deduplicate errors? Or maybe just leave it when checking the root type, and only emit it when checking all edges + properties?

I'll push what I have so you can see.

fprasx commented 5 months ago

Okie, added the tests. Stumbled upon the issue of errors being a different order so I also added a method to DisplayVec. I went for the quadratic approach because a) there probably won't be a prohibitively slow number of errors and b) imposing Hash/Ord on errors seems unreasonable (and might be impossible due to wrapping async_graphql_parser errors). For this reason, we also can't use a hashset or btreeset.

obi1kenobi commented 5 months ago

Stumbled upon the issue of errors being a different order

Do they appear in a non-deterministic order? If so, that's a bug that we have to fix — we really, really want Trustfall to be a deterministic system because otherwise it'll become impossible to debug. For example, this is also why we always use BTreeMap / BTreeSet instead of their hashed equivalents.

I'm not a fan of the new method because it allows non-determinism to sneak in. Can you try removing the set equality method and then see if it's enough to flip the lines in the expected test output file, or if there's a deeper determinism issue?

Thanks for powering through this!

fprasx commented 5 months ago

Yup, it's because of a HashMap nondeterminism (trustfall_core/src/schema/mod.rs:331):

fn check_type_and_property_and_edge_invariants(
    query_type_definition: &TypeDefinition,
    vertex_types: &HashMap<Arc<str>, TypeDefinition>,
) -> Result<(), Vec<InvalidSchemaError>> {
    let mut errors: Vec<InvalidSchemaError> = vec![];

    for (type_name, type_defn) in vertex_types {

All of the type-checking methods in this file take HashMaps, and here's the definition of the Schema struct:

#[derive(Debug, Clone)]
pub struct Schema {
    pub(crate) schema: SchemaDefinition,
    pub(crate) query_type: ObjectType,
    pub(crate) directives: HashMap<Arc<str>, DirectiveDefinition>,
    pub(crate) scalars: HashMap<Arc<str>, TypeDefinition>,
    pub(crate) vertex_types: HashMap<Arc<str>, TypeDefinition>,
    pub(crate) fields: HashMap<(Arc<str>, Arc<str>), FieldDefinition>,
    pub(crate) field_origins: BTreeMap<(Arc<str>, Arc<str>), FieldOrigin>,
}

I can do a larger refactor converting these to their B-tree equivalents.

obi1kenobi commented 5 months ago

Let's approach it a different way: we can fix the nondeterminism by accessing the elements in a predictable order, instead of iterating over the hashmap.

For the schema type, we care a bit less about schema parsing/validation performance since that happens only once in the app's lifecycle, and we care a lot about query type-checking performance because that happens once per query. HashMap is faster than BTreeMap past a certain minimum size (and large schemas are a thing), so it's better to keep the fields on Schema as hashmaps and do a sort on the vertex type names as part of validating schemas.

Sorry this is a bit complicated!

fprasx commented 5 months ago

All good. Interesting to think through.

A little sad though that I can't just %s/HashMap/BTreeMap/g . . . because that basically just worked 🤣

obi1kenobi commented 5 months ago

Rust is like that! Things tend to just work, and it will spoil you rotten! It's hard to switch back to another language afterward 😅

fprasx commented 5 months ago

Very true. No good deed goes unpunished haha. I'm going to make a new PR for the deterministic iteration order, and then I'll rebase this one on to of that if that's ok. I think it'll be cleaner.

obi1kenobi commented 5 months ago

Sounds good to me!

fprasx commented 5 months ago

Kind of saw this coming so I was able to just revert the commit adding the set stuff and rebase :)

obi1kenobi commented 5 months ago

Looks great!