Closed ajs6f closed 4 years ago
Or use ActivityPub for this, which is a much more LDP-friendly spec.
It is more LDP-friendly, but seems much more complex... I will check it out.
From a purely self-centered standpoint, I'm thinking about the maintenance of an external index (like Solr) and how would it interact with all the possible endpoints? I started to read the ActivityPub spec, but that could take me a while. So does it have a way to subscribe to a range of activity streams or all of them if desired?
I really want to avoid having to aggregate information up a filesystem. That's going to be expensive in code and in operation, so to the extent that we can go "eventing-endpoint-per-resource" I want to.
If it's "eventing-endpoint-per-resource", then Linked Data Notifications might be the better thing to use. I have already written a consumer and sender -- in Python, and that may be a straight-forward way forward on this. The existing code would need a way to expose an ldp:inbox
per resource, but we could rely on convention for that, too.
I wouldn't want to use another URI in the same hierarchy, because of the strong chance of collision, unless we wanted to pick a really exceptional segment like ldp:inbox
?
I was thinking of a completely separate hierarchy, aggregating events there.
That would certainly be simplest, and we could just mirror the URIs with a prefix-swap.
Hm... from LDN we get:
Receivers MUST support GET and POST requests on the Inbox URL. In LDP terms, an Inbox is a Container.
which would seem to eliminate read-only operation...
But we don't need to claim that it is a receiver. We just use the consumer/sender pattern to push AS/2.0 data into a container that can be retrieved by an LDP client.
Where does that container live? What server is serving it?
I was thinking it would be more of an out-of-band operation -- the python sender watches a directory, if something changes, it writes an AS/2.0 message to the corresponding location in the activity
hierarchy.
What I'm trying to understand is what is hosting the activity
hierarchy-- clearly not static-ldp
, which is read-only and won't accept POST
s...
Just thinking "out loud" here... Maybe every LDP-NR and LDP-RS gets a corresponding LDPC basic container in the activity
hierarchy -- on the backend, this is just a new directory. If something changes with a resource (via file alerts), a new LDP-RS (JSON-LD in AS/2.0 format) gets written there. The process of adding that activity wouldn't be part of a POST
, since the resource itself isn't managed via HTTP.
Maybe @csarven has some thoughts on this -- the background is that this is a purely read-only LDP server (GET
, HEAD
and OPTIONS
only) in which users are managing static resources themselves (not via HTTP). The question is: how might we represent changes or events on those resources. One of the main ideas of this project is to keep the code extremely simple while supporting as much of LDP as possible.
Yes, this might be a decent place to go-- a ½-LDN impl that does GET
but not POST
.
@whikloj would that meet your concerns about integration?
I'm not sure I have this. So correct me if I missed something. But if I wanted to maintain an index of the contents of a static-ldp repo. My consumer would check the inboxes of all the resources in the activity
hierarchy and pull any new changes and index those.
So if I was watching the whole repo for changes I would need to check each inbox
, correct? But as resources could be added or deleted on the filesystem I would need to walk the repo to get all the currently existing resources?
This isn't a huge deal and I'm not sure I would actually need an index of these resources. More of a "just in case" scenario going on in my head. Plus LDN and ActivityPub is still quite ethereal to me.
I think you're right, @whikloj. The computation/IO to retrieve each partition of events cumulatively (to create the stream you want to use to index) is going to happen either server-side or client-side, and we're talking about maybe doing it client-side.
Of course, the "dual" of that is that rather than filtering a union stream, you just subscribe (or collect from) only those feeds that interest you. So it's not lose-lose, it's give-and-take...
I'd agree with the "give-and-take" assessment. It does allows you more flexibility to only subscribe to a subset of resources if you so choose. Sounds like a plan. 👍
Just to add to @whikloj 's https://github.com/trellis-ldp/static-ldp/issues/35#issuecomment-293319582 , it depends a bit on how the inbox is managed by the server (receiver) e.g., the inbox could contain a single resource as a summary for all of the events, as opposed to each event being captured in its own resource. Or another way of arranging the data may be where there is a dedicated resource with its own inbox for a summary of activities (which is sort of like an aggregate of all of the activities in other inboxes).
And, I agree with @ajs6f going half-way towards LDN (or LDP) with read-only would be fine, and leaves room to extend that if/when needed in the future.
Cool, sounds like we can have our cake and eat it too! :)
I'll get to work on a more detailed proposal.
Couple of proposals:
The "special" location should be written as a constant in the code. I know we don't want to add configuration, but I do think we want to let people override in case of collision. Default could be activity
or events
or something.
Looks like FAM would be the right tool to use here if we want to stay within the original language footprint, but PHPsters, please correct me if that's wrong or out-of-date.
I'd like to try starting with a "twofer" accounting, featuring both:
activity
(or whatever it is called) that simply aggregates everything (as @csarven described).This seems to me to meet all cases, although the performance implications for the "summary" (and even the "fine-grained" hierarchy in the case of long-lived resources) are... interesting. We have an advantage tho': we can do real paging, and without much difficulty. Static resources FTW!
@ajs6f Adding configuration is fine -- there is already plenty of default configuration that can be overridden. New configuration would go here with sensible default values.
Re: paging.
LDN (here) specifically absolves itself from choosing between paging as offered in LDP Paging and as offered in Activity Streams (or something else). I'm inclined towards AS. Thoughts?
+1 for using Activity Streams paging.
In order to impl AS Paging, would it make sense for the events in a given Basic Container inbox to be partitioned, on-disk, into pages of a predetermined size? So resource /foo/bar/
would have an inbox /activity/foo/bar/
within which there would be /activity/foo/bar/page1/
, /activity/foo/bar/page2/
, etc., within which there would be /activity/foo/bar/page2/1492009392
, /activity/foo/bar/page2/1492009489
, etc.
Maybe a default of 50 to a page?
(This comes under the heading of "KIS KIS"-- Keep It Static, Keep It Simple.)
There is not any way currently to add properties (triples) to a Basic Container, right? I'm trying to figure which of two roads makes more sense (and maybe I'm missing a better one):
Inbox
) subtyping BasicContainer
. I like 1, because it is general and keeps the code smaller (I think), but the natural way would be to select a special name (e.g. properties
) and decide that any RDF file with that name in a directory, instead of becoming its own RDFSource
, becomes "bonus" properties on the enclosing BasicContainer
-directory.
Does that sound reasonable? It's a serious expansion.
This is where I am already adding triples to a Basic Container: https://github.com/trellis-ldp/static-ldp/blob/master/src/Model/BasicContainer.php#L45-L48 if that helps.
Oh, I'm cool with the code-- I just want to check if this seems like good design. I take it you are cool with an "extra properties" file?
Tangentially: #36.
There hasn't been any movement on this issue in over two years, so I am going to close it. If someone wants to pick it up at some future point, that's great.
Suppose an RSS feed is provided for every Basic Container, via conneg (i.e. requests with
Accept: application/rss+xml
will return the feed).Each
/item/description
in the feed would contain an appropriate Activity Streams 2.0 serialization of an event occurring to or directly within that Basic Container.If a container is destroyed, that would be represented as the Fedora spec prescribes by an event on its containing container, and all of the resource-destroyed events that would have occurred inside that container would similarly be published at the feed of the (still-extant) parent container.
This is obviously a different approach than has been taken in the past to messaging in the Fedora community. Previous forms of messaging have produced a unitary feed.
Does this sound crazy? Reasonable?