Simpler client request interface?

gavin-norman-sociomantic commented 5 years ago

(This issue possibly doesn't apply to swarm specifically, but might have to be implemented independently in each *proto. Writing it here initially, just to put it in a central location.)

The current recommended client usage looks something like this (taking the DHT client's Get request as an example):

private void getNotifier ( DhtClient.Neo.Get.Notification info, Const!(DhtClient.Neo.Get.Args) args )
{
    with ( info.Active ) switch ( info.active )
    {
        // Success cases...
        case received:
            // Received the requested record.
            break;

        case no_record:
            // Key not found in DHT.
            break;

        // Transient error cases...
        case timed_out:
        case node_disconnected:
            // The request may be retried.
            break;

        // Fatal error cases...
        case failure:
        case no_node:
        case node_error:
        case wrong_node:
        case unsupported:
            // The request may be retried, but is unlikely to ever succeed.
            break;
    }
}

This is of the "precise knowledge gives great control" variety of API: it provides the user with full information about what happened, allowing them to decide what to do, but does not implement any default behaviour.

On the other hand, we have the simplified, Task-blocking API, which looks like this:

void[] get_buf;
auto result = this.dht.blocking.get("channel", 0x1234567812345678,
    get_buf);
if ( result.succeeded )
{
    if ( result.value.length )
        // Received the requested record.
    else
        // Key not found in DHT.
}
else
    // All error cases

This is the opposite extreme: it only provides the user with ultra minimal information about what happened, and still doesn't implement any default error handling behaviour.

It seems to me that:

Real use cases will inevitably use the first, more complex API.
Most (all?) use cases will end up implementing more or less the same error handling behaviour via the notifier.
An API that provides built-in error handling would be more useful.

Something like this, for DHT Get:

Return the fetched record to the user in a buffer.
Automatically log and retry the request after a transient error.
Automatically log and abort the request after a fatal error.

gavin-norman-sociomantic commented 5 years ago

@scott-gibson-sociomantic @don-clugston-sociomantic @daniel-zullo-sociomantic as the primary users of the neo clients, any thoughts on this?

scott-gibson-sociomantic commented 5 years ago

in the case of fatal errors failure, no_node, node_error, wrong_node and unsupported, is there any thing that would an application would do differently as a response for these cases? Could these be lumped in to a generic fatal_error that provides more detail in the format notification?

As for retry behaviour, i'm unsure how to approach this as my applications have different requirements (some should always retry, some can retry a few times then decide not to continue). As long as these apps can have control over the behaviour then perhaps a simple request wrapper would suffice.

In either case i think we might need a greater sample size of usage until we can try to simplify.

gavin-norman-sociomantic commented 5 years ago

in the case of fatal errors failure, no_node, node_error, wrong_node and unsupported, is there any thing that would an application would do differently as a response for these cases? Could these be lumped in to a generic fatal_error that provides more detail in the format notification?

Right, that could work. It'd require quite a bit of reworking of the way things are currently set up, but I think it'd present a nicer API to the end-user.

As for retry behaviour, i'm unsure how to approach this as my applications have different requirements (some should always retry, some can retry a few times then decide not to continue). As long as these apps can have control over the behaviour then perhaps a simple request wrapper would suffice.

Indeed, this is the tricky bit. I imagine that one-shot requests like this that could be retried would have some additional arguments to configure this behaviour. Something like this maybe:

DhtClient.Neo.Get.RetryConfig retry;
retry.max_retries = 3;
dht_client.get("channel", key, retry);

gavin-norman-sociomantic commented 5 years ago

Right, that could work. It'd require quite a bit of reworking of the way things are currently set up, but I think it'd present a nicer API to the end-user.

Though... it'd mean a union inside a union, which might not be that fun to deal with.

gavin-norman-sociomantic commented 5 years ago

In either case i think we might need a greater sample size of usage until we can try to simplify.

I agree with this point. We don't have a large enough usage sample yet to be able to make a good judgement. I just wanted to get some ideas going, in advance.

nemanja-boric-sociomantic commented 5 years ago

How about having the taskblocking call return success on success (as is now) and a SmartUnion on failure, so that user can react on only those events they are interested in:


void[] get_buf;
auto result = this.dht.blocking.get("channel", 0x1234567812345678,
    get_buf);
if ( result.succeeded )
{
    if ( result.value.length )
        // Received the requested record.
    else
        // Key not found in DHT.
}
else {
    with (result.error) switch (result.error) {
        case node_disconnected:
            schedule_retry(); break;
     }
}

gavin-norman-sociomantic commented 5 years ago

That would be an improvement to the current API, yes. It wouldn't address the point about most apps (probably) ending up implementing the same error handling logic, though, which is kind of my main problem with the current approach.

gavin-norman-sociomantic commented 5 years ago

Thoughts on implementing this:

The retrying logic should be implementing inside each request handler, not at the level of the request assignment method.
Doing it this way will make error handling consistent across all requests (i.e. handle retrying or continuing after transient errors).
This approach will also be compatible with request timeouts, for requests that implement them. (Timeouts work at the level of the request, so you could have a situation where a request starts, gets a connection error, retries, and succeeds, all within the timeout.)

don-clugston-sociomantic commented 5 years ago

Scott;

in the case of fatal errors failure, no_node, node_error, wrong_node and unsupported, is there any thing that would an application would do differently as a response for these cases? Could these be lumped in to a generic fatal_error that provides more detail in the format notification?

These cases are not quite the same. Some of those, such as unsupported indicated a deployment versioning problem. I would want to terminate the app in that case, it can only be fixed by installing new software somewhere. But some of the others might be caused by hardware problems in one of the nodes or in a router. In that case the ISP or infrastructure team may fix it, so that it will recover automatically, without needing to restart the app. But even so, I think your idea of a fatal_error is a good one.

Gavin:

Real use cases will inevitably use the first, more complex API.

I don't think that's true. For apps like shore, where the data is not very valuable, the task blocking API is perfect. In general I'm wary of something like a framework that tries to simplify the most common kinds of requests. It's very easy for that sort of thing to be not-quite-flexible-enough.

daniel-zullo commented 5 years ago

You might want to consider defining a generic default behaviour for retrying a request and error handling given the user the flexibility to implement their own ones for special/corner cases. From the user perspective this could help avoiding code duplication and also reduce the focus in the logic of handling a request and mainly deal with the business logic on application/client side instead.

gavin-norman-sociomantic commented 5 years ago

You might want to consider defining a generic default behaviour for retrying a request and error handling given the user the flexibility to implement their own ones for special/corner cases. From the user perspective this could help avoiding code duplication and also reduce the focus in the logic of handling a request and mainly deal with the business logic on application/client side instead.

Yeah, I certainly wouldn't envisage removing the current "full info, full power" API.

gavin-norman-sociomantic commented 5 years ago

I don't think that's true. For apps like shore, where the data is not very valuable, the task blocking API is perfect.

Interesting. I didn't expect anyone to use the blocking API without the notifiers in anything but simple tests. The danger of it is that error logging will be very unhelpful.

gavin-norman-sociomantic commented 5 years ago

Don, maybe we're getting mixed up. There are three ways of assigning many requests:

The callback-based, full notifier method.
The blocking, full notifier method.
The blocking, no notifier method.

3 isn't recommended for use in real apps, but 2 is.

scott-gibson-sociomantic commented 5 years ago

Some of those, such as unsupported indicated a deployment versioning problem. I would want to terminate the app in that case, it can only be fixed by installing new software somewhere.

Theoretically the node could be reverted or updated that would resolved the unsupported request. An aggregation app continue running aggregating data in that case.

sociomantic-tsunami / swarm

Simpler client request interface? #356