trellis-ldp / static-ldp

A simple way to expose static assets as a read-only Linked Data server
Apache License 2.0
12 stars 3 forks source link

Design: Format for events? #35

Closed ajs6f closed 4 years ago

ajs6f commented 7 years ago

Suppose an RSS feed is provided for every Basic Container, via conneg (i.e. requests with Accept: application/rss+xml will return the feed).

Each /item/description in the feed would contain an appropriate Activity Streams 2.0 serialization of an event occurring to or directly within that Basic Container.

If a container is destroyed, that would be represented as the Fedora spec prescribes by an event on its containing container, and all of the resource-destroyed events that would have occurred inside that container would similarly be published at the feed of the (still-extant) parent container.

This is obviously a different approach than has been taken in the past to messaging in the Fedora community. Previous forms of messaging have produced a unitary feed.

Does this sound crazy? Reasonable?

acoburn commented 7 years ago

Or use ActivityPub for this, which is a much more LDP-friendly spec.

ajs6f commented 7 years ago

It is more LDP-friendly, but seems much more complex... I will check it out.

whikloj commented 7 years ago

From a purely self-centered standpoint, I'm thinking about the maintenance of an external index (like Solr) and how would it interact with all the possible endpoints? I started to read the ActivityPub spec, but that could take me a while. So does it have a way to subscribe to a range of activity streams or all of them if desired?

ajs6f commented 7 years ago

I really want to avoid having to aggregate information up a filesystem. That's going to be expensive in code and in operation, so to the extent that we can go "eventing-endpoint-per-resource" I want to.

acoburn commented 7 years ago

If it's "eventing-endpoint-per-resource", then Linked Data Notifications might be the better thing to use. I have already written a consumer and sender -- in Python, and that may be a straight-forward way forward on this. The existing code would need a way to expose an ldp:inbox per resource, but we could rely on convention for that, too.

ajs6f commented 7 years ago

I wouldn't want to use another URI in the same hierarchy, because of the strong chance of collision, unless we wanted to pick a really exceptional segment like ldp:inbox?

acoburn commented 7 years ago

I was thinking of a completely separate hierarchy, aggregating events there.

ajs6f commented 7 years ago

That would certainly be simplest, and we could just mirror the URIs with a prefix-swap.

ajs6f commented 7 years ago

Hm... from LDN we get:

Receivers MUST support GET and POST requests on the Inbox URL. In LDP terms, an Inbox is a Container.

which would seem to eliminate read-only operation...

acoburn commented 7 years ago

But we don't need to claim that it is a receiver. We just use the consumer/sender pattern to push AS/2.0 data into a container that can be retrieved by an LDP client.

ajs6f commented 7 years ago

Where does that container live? What server is serving it?

acoburn commented 7 years ago

I was thinking it would be more of an out-of-band operation -- the python sender watches a directory, if something changes, it writes an AS/2.0 message to the corresponding location in the activity hierarchy.

ajs6f commented 7 years ago

What I'm trying to understand is what is hosting the activity hierarchy-- clearly not static-ldp, which is read-only and won't accept POSTs...

acoburn commented 7 years ago

Just thinking "out loud" here... Maybe every LDP-NR and LDP-RS gets a corresponding LDPC basic container in the activity hierarchy -- on the backend, this is just a new directory. If something changes with a resource (via file alerts), a new LDP-RS (JSON-LD in AS/2.0 format) gets written there. The process of adding that activity wouldn't be part of a POST, since the resource itself isn't managed via HTTP.

Maybe @csarven has some thoughts on this -- the background is that this is a purely read-only LDP server (GET, HEAD and OPTIONS only) in which users are managing static resources themselves (not via HTTP). The question is: how might we represent changes or events on those resources. One of the main ideas of this project is to keep the code extremely simple while supporting as much of LDP as possible.

ajs6f commented 7 years ago

Yes, this might be a decent place to go-- a ½-LDN impl that does GET but not POST.

@whikloj would that meet your concerns about integration?

whikloj commented 7 years ago

I'm not sure I have this. So correct me if I missed something. But if I wanted to maintain an index of the contents of a static-ldp repo. My consumer would check the inboxes of all the resources in the activity hierarchy and pull any new changes and index those.

So if I was watching the whole repo for changes I would need to check each inbox, correct? But as resources could be added or deleted on the filesystem I would need to walk the repo to get all the currently existing resources?

This isn't a huge deal and I'm not sure I would actually need an index of these resources. More of a "just in case" scenario going on in my head. Plus LDN and ActivityPub is still quite ethereal to me.

ajs6f commented 7 years ago

I think you're right, @whikloj. The computation/IO to retrieve each partition of events cumulatively (to create the stream you want to use to index) is going to happen either server-side or client-side, and we're talking about maybe doing it client-side.

ajs6f commented 7 years ago

Of course, the "dual" of that is that rather than filtering a union stream, you just subscribe (or collect from) only those feeds that interest you. So it's not lose-lose, it's give-and-take...

whikloj commented 7 years ago

I'd agree with the "give-and-take" assessment. It does allows you more flexibility to only subscribe to a subset of resources if you so choose. Sounds like a plan. 👍

csarven commented 7 years ago

Just to add to @whikloj 's https://github.com/trellis-ldp/static-ldp/issues/35#issuecomment-293319582 , it depends a bit on how the inbox is managed by the server (receiver) e.g., the inbox could contain a single resource as a summary for all of the events, as opposed to each event being captured in its own resource. Or another way of arranging the data may be where there is a dedicated resource with its own inbox for a summary of activities (which is sort of like an aggregate of all of the activities in other inboxes).

And, I agree with @ajs6f going half-way towards LDN (or LDP) with read-only would be fine, and leaves room to extend that if/when needed in the future.

ajs6f commented 7 years ago

Cool, sounds like we can have our cake and eat it too! :)

I'll get to work on a more detailed proposal.

ajs6f commented 7 years ago

Couple of proposals:

  1. The "special" location should be written as a constant in the code. I know we don't want to add configuration, but I do think we want to let people override in case of collision. Default could be activity or events or something.

  2. Looks like FAM would be the right tool to use here if we want to stay within the original language footprint, but PHPsters, please correct me if that's wrong or out-of-date.

ajs6f commented 7 years ago

I'd like to try starting with a "twofer" accounting, featuring both:

  1. a "fine-grained" event hierarchy (of the kind we originally discussed) for which each basic container in the main hierarchy has an inbox with the same name (with prefix munged), containing events with some autogen'd names (UUIDs or timestamps or something),
  2. and a single "summary" resource as a child of activity (or whatever it is called) that simply aggregates everything (as @csarven described).

This seems to me to meet all cases, although the performance implications for the "summary" (and even the "fine-grained" hierarchy in the case of long-lived resources) are... interesting. We have an advantage tho': we can do real paging, and without much difficulty. Static resources FTW!

acoburn commented 7 years ago

@ajs6f Adding configuration is fine -- there is already plenty of default configuration that can be overridden. New configuration would go here with sensible default values.

ajs6f commented 7 years ago

Re: paging.

LDN (here) specifically absolves itself from choosing between paging as offered in LDP Paging and as offered in Activity Streams (or something else). I'm inclined towards AS. Thoughts?

acoburn commented 7 years ago

+1 for using Activity Streams paging.

ajs6f commented 7 years ago

In order to impl AS Paging, would it make sense for the events in a given Basic Container inbox to be partitioned, on-disk, into pages of a predetermined size? So resource /foo/bar/ would have an inbox /activity/foo/bar/ within which there would be /activity/foo/bar/page1/, /activity/foo/bar/page2/, etc., within which there would be /activity/foo/bar/page2/1492009392, /activity/foo/bar/page2/1492009489, etc.

Maybe a default of 50 to a page?

(This comes under the heading of "KIS KIS"-- Keep It Static, Keep It Simple.)

ajs6f commented 7 years ago

There is not any way currently to add properties (triples) to a Basic Container, right? I'm trying to figure which of two roads makes more sense (and maybe I'm missing a better one):

  1. Add that ability, then use it to present the right properties for inboxes, or
  2. add a new PHP class (Inbox) subtyping BasicContainer.

I like 1, because it is general and keeps the code smaller (I think), but the natural way would be to select a special name (e.g. properties) and decide that any RDF file with that name in a directory, instead of becoming its own RDFSource, becomes "bonus" properties on the enclosing BasicContainer-directory.

Does that sound reasonable? It's a serious expansion.

acoburn commented 7 years ago

This is where I am already adding triples to a Basic Container: https://github.com/trellis-ldp/static-ldp/blob/master/src/Model/BasicContainer.php#L45-L48 if that helps.

ajs6f commented 7 years ago

Oh, I'm cool with the code-- I just want to check if this seems like good design. I take it you are cool with an "extra properties" file?

ajs6f commented 7 years ago

Tangentially: #36.

acoburn commented 4 years ago

There hasn't been any movement on this issue in over two years, so I am going to close it. If someone wants to pick it up at some future point, that's great.