trellis-ldp / trellis

Trellis is a platform for building scalable Linked Data applications
https://www.trellisldp.org
Apache License 2.0
105 stars 21 forks source link

ResourceService::getContainer is synchronous #264

Closed ajs6f closed 5 years ago

ajs6f commented 5 years ago

We have Optional<IRI> getContainer(final IRI id) and as far as I see, that is the sole remaining synchronous method in ResourceService (that actually deals with resources). Do we want to go to CompletableFuture<IRI> getContainer(final IRI id) or even CompletableFuture<Resource> getContainer(final IRI id) and finish the job?

acoburn commented 5 years ago

I looked through the code and there are really only two uses of this method:

AFAICT, all of these methods get the container IRI and then immediately fetch the corresponding resource (if one exists). Which suggests that we might as well change this to a CompletableFuture<Resource>.

ajs6f commented 5 years ago

I'll try to make a PR sometime this week.

acoburn commented 5 years ago

One potential issue here is this: in order to perform WebAC authorization, we need to recursively look up the parent resource if no ACL is defined on the resource in question. Here, there is a bit of a conflict between the SOLID webacl definition and the ability in Trellis to create uncontained resources. In order to follow the SOLID guidelines, the WebAC module will follow a structural-logical containment hierarchy based on the resource IRI regardless of the existence of a resource: so /foo/bar/baz is ipso facto contained by /foo/bar, even if /foo/bar doesn't exist. That hierarchy is followed until either (a) an extant resource contains an ACL definition or (b) we've reached the root resource. SOLID also requires that the root resource have an ACL. We could treat every uncontained resource as its own distinct root resource, but then do we require that an ACL is defined on each of those resources? That seems a little too prescriptive and/or cumbersome (it also gets into a chicken-egg issue).

If you assume that the root resource contains an ACL, then it would be fine to convert this method into a CompletableFuture<Resource>, and the hierarchy could be followed more-or-less as it is now. If, however, the root resource doesn't contain an ACL (maybe someone decided to remove it or it didn't get initialized properly?), there is a problem: because an arbitrary resource could be uncontained, if ::getContainer returns with, say, MISSING_RESOURCE, what happens? Does the algorithm proceed, checking for a further ancestor resource? But what if we've already reached the root resource, which doesn't have a parent? In a word: how does the WebAC algorithm know to stop?

One option is to convert this method to a CompletableFuture<Resource> and use it as such in the TrellisHttpResource class. That would be easy. Then, the WebAC service implementation would use its own private method for determining a containment hierarchy for the purpose of performing WebAC authorization. This seems like a logical option to me.

Another option is to completely remove this method. The more I think about this, the better this option seems. For the HTTP layer, we can refactor this to make use of Resource::getContainer instead -- there's a little bit of work involved there, but it's not terrible. And it's probably a good idea in the end. Then, the WebAC implementation does what it needs to do to determine the containment hierarchy.

acoburn commented 5 years ago

A further note about the two options mentioned at the end of the above comment (both of which assume that the WebAC service implements its own method for following the axis of containment).

The advantage of the first option (convert the method to CompletableFuture<Resource> for the HTTP service) is that the target resource and the parent resource can be fetched concurrently: hence the use of CompletableFuture::thenCombine; otherwise (if we remove the method), the lookups will necessarily be sequential: first look up the target resource, then once that resolves, figure out the parent IRI from that resource and look up the parent.

ajs6f commented 5 years ago

Does it matter beyond "client ergonomics"? What I mean is-- firing the requests in parallel doesn't change any of the guarantees, does it? We are still talking about two requests that do not compose an atomic request, right? Either serial or parallel.

I get that thenCombine is nice here, and I have no problem going with that if that's the reasoning. Although (given the second choice) this doesn't look too horrible:

…rs.get(iri).thenCombine(rs.get(iri).thenApply(Resource::getContainer), myRadBiFunction);

and if that's still gross, we can leave a convenience method in ResourceService:

default CompletableFuture<Resource> getContainer(IRI iri) {
    return get(iri).thenApply(Resource::getContainer);
}

and our example is then just:

…rs.get(iri).thenCombine(rs.getContainer(iri), myRadBiFunction);

and again, as far as I understand the contracts, that's really all just different nut chunks on the same wine-cheddar-spread log. Or is there something about the guarantees that I missed, that is a good reason to maintain separate paths?

acoburn commented 5 years ago

We're mostly talking about ergonomics at this point.

ajs6f commented 5 years ago

Ok, well, I'll let you make the call. As I hinted, if we keep the method, the impl is trivial, so I'm fine with anything that gets the synchrony outta there. That's the meat of this issue.

acoburn commented 5 years ago

I'll start by factoring this method out of the WebAC code. That will make any change a lot easier to deal with in the HTTP layer.

ajs6f commented 5 years ago

Cool, ping me if it turns out to be harder than it first seems.

acoburn commented 5 years ago

Now, there are (at least) two approaches for refactoring the ResourceService::getContainer method. Since it is used only once, one option is to remove the method from the ResourceService interface entirely and just call out to api.TrellisUtils::getContainer in the TrellisHttpResource::getParent method.

Alternatively, the code in TrellisHttpResource::getParent can become the new default implementation for ResourceService::getContainer.

The advantage of the first approach is that it simplifies the interface. The advantage of the second approach is that it makes it possible for a ResourceService implementation to override that method.

I incline somewhat toward the first approach, since I'm not entirely sure that such flexibility is necessary. What do others think?

acoburn commented 5 years ago

Thinking about this some more, I'm going to suggest the first approach above: remove the ResourceService::getContainer altogether and update this line to call to TrellisUtils::getContainer directly. The reason is that, at present, the HTTP layer generates containment-type URLs with a particular structure: i.e. a POST to /foo results in a resource at /foo/bar, rather than just /bar. While that URL-level hierarchy isn't strictly necessary for modeling LDP containment, it could make things quite confusing to users. Part of the reason for having ResourceService::getContainer was to provide a way for an implementation to have flexibility in how URLs are structured, but so long as the code is asymmetrical (creation defined strictly in the HTTP layer and fetching defined in the persistence layer) the structure doesn't make much sense. I'm not opposed to adding that flexibility, but until there is a use case for this level of flexibility, I think it makes sense to keep the structure simple and have the code for this defined in one place: the HTTP layer.

@ajs6f do you still want to make this change? It should be exceedingly simple and will mostly involve just removing code.

ajs6f commented 5 years ago

So there would be no ResourceService::getContainer but there would still be Resource::getContainer?

acoburn commented 5 years ago

Correct. The Resource::getContainer is used quite a bit.

ajs6f commented 5 years ago

But switch Resource::getContainer to CompletableFuture<Resource> or leave it be as Optional<Resource>?

acoburn commented 5 years ago

The Resource::getContainer should stay as it is: Optional<IRI>. The Resource interface doesn't load additional resources; that's all handled through the ResourceService interface.

ajs6f commented 5 years ago

That's what I thought, and I'm fine with that, although I suspect we should make it clearer in the Javadocs with an @implSpec. Would you write of Resource that it should not load from the backend? Because my impl does stream quads direct from the backend. I would say that stream is the exception to that rule.

acoburn commented 5 years ago

Yes, we should make that clear in the javadocs. What I've generally done in my implementations is create a separation of metadata and content for a resource: metadata (last modified, ldp type, etc) is loaded when the resource object is created while the content is loaded on-demand via the stream method. I don't think an impl necessarily needs to follow that pattern, but I think it's a good one, especially in the context of HTTP.

ajs6f commented 5 years ago

That's exactly what I did, and I think we should document that pattern. I don't know if we can argue right now that it is required, but it is interesting that we both came to it independently.

ajs6f commented 5 years ago

Ok, clearly the actual thrust of this ticket is resolved by simply removing ResourceService::getContainer. Do you want a PR for that, and do we want to dig into the larger issue of "which types are expected to reach to the backend, and in what way" in another ticket?

acoburn commented 5 years ago

Yes, that makes sense -- a PR to remove the ResourceService::getContainer and then move on to the next issue separately.