obi1kenobi / trustfall

A query engine for any combination of data sources. Query your files and APIs as if they were databases!
Apache License 2.0
2.3k stars 66 forks source link

Error handling from Adapters? #148

Open leeola opened 1 year ago

leeola commented 1 year ago

I'm currently trying to understand how to write an adapter via examples (not the easiest, but it's something), and interestingly it appears there's no meaningful error handling. In many cases unwrap() is used, and in others eprintln!() is used and simply drops results afterwards.

Really feels like error handling of adapters should be a first class citizen here. A very important component of a dynamic web of adapters pulling from a large array of potentially error prone data sources.

Is there a plan for error handling?

Perhaps arguably errors themselves could be used as data, but i would think at the very least the query engine would want to be aware that the data result is not complete or correct.

edit: Not sure if this is technically an issue or a discussion. I view it as a problem to be fixed though, so perhaps issue. Unless i'm just simply wrong :D

obi1kenobi commented 1 year ago

Thanks for checking out Trustfall! Sorry for the mess, we're working on sprucing up the docs right now.

The easiest way to write an adapter is by implementing this trait, which is thankfully quite thoroughly documented already: https://docs.rs/trustfall_core/latest/trustfall_core/interpreter/basic_adapter/trait.BasicAdapter.html

Errors and async both have an incomplete story at the moment. Trustfall's philosophy is to allow "escape hatches" — workarounds that can get the job done until a more thorough solution can be put in place.

The "escape hatch" for errors is this:

It would look roughly like this:

fn get_output_or_error<'query>(
    adapter: Rc<RefCell<MyAdapterType>>,
    results_iter: &mut dyn Iterator<Item = BTreeMap<Arc<str>, FieldValue>> + 'query,
) -> Result<Option<BTreeMap<Arc<str>, FieldValue>>, MyError> {
    let maybe_data = results_iter.next();
    let adapter_ref = adapter.borrow_mut();
    if let Some(error) = adapter_ref.error.take() {
        Err(error)
    } else {
        Ok(maybe_data)
    }
}

Importantly, your adapter implementation gets to decide whether errors should terminate a query or whether it's safe to keep going. If you decide that a query shouldn't keep going, then it's either up to your adapter to "blank out" all future requests for data, or up to the code that receives the query results iterator to not call .next() again after getting an error.

This is obviously not ideal and a bit boiler-platey. Ideally we'd like to have some more opinionated and easier-to-use layers on top of the "bare metal" that this layer represents. Until we get a chance build those higher layers, this should work too.

Please let me know if you run into any trouble, I'd be happy to work with you to sort out any issues!

leeola commented 1 year ago

Good point about async, that was going to be my next concern hah. Honestly both still seem quite a big concern for me, but i think i need to grok this API to understand how it could fit in better before i can give any meaningful criticism or feedback.

Do you see a future where Trustfall supports both error handling and async as "first class"? Or will it live permanently in userland?

For now i'll just unwrap and run async tasks on inner runtime(s), and hopefully after the API and Dataflow makes sense to me i'll be able to contribute hah.

obi1kenobi commented 1 year ago

Do you see a future where Trustfall supports both error handling and async as "first class"? Or will it live permanently in userland?

Yes, they should be first-class.

The notion of "userland" is a bit weird here: just like libc is technically not OS-provided but also next-to-nobody writes their own. trustfall_core should be the absolute minimum shared abstraction everything needs, and we should build different higher-level abstractions that can satisfy different use cases — those can be the adapter maintainers' libc equivalent.

For example: the Adapter trait is the barest minimum for querying datasets — it's designed so that any use case can sit on top of it. But 99% of the time it's too low-level and too complicated, and BasicAdapter is a much friendlier alternative that's almost always sufficient. Under the hood, we have a built-in impl Adapter for BasicAdapter so anything that uses BasicAdapter works seamlessly.

I'd love to do the same for async, error handling, etc. If your adapter implementation is infallible, you ideally should never have to write Result or Ok.

For now i'll just unwrap and run async tasks on inner runtime(s)

This is the recommended "escape hatch" right now. One of the demos uses it too.

but i think i need to grok this API to understand how it could fit in better

@ginger51011 and I have been doing a bunch of cleanup, docs work, and simplifications of examples based on newer APIs lately, so hopefully we can make progress fast enough to ease the process for you.

If there are specific things that are confusing, we'd love to know. It'll help us document them better and we'll know to prioritize them.

If you don't want to open separate GitHub issues for everything, I'm happy to switch to chat / email / call and answer any questions that way, then later open the appropriate issues for what needs improving.