oduwsdl / MemGator

A Memento Aggregator CLI and Server in Go
https://memgator.cs.odu.edu/api.html
MIT License
56 stars 11 forks source link

Feature Request/Discussion: Early Hints #133

Open machawk1 opened 3 years ago

machawk1 commented 3 years ago

Recently I have become more familiar with RFC8297, which details the HTTP 103 (Early Hints) status code.

In an effort to be transparent, an aggregator might want to expose the sources being queried, for example:

> curl -I https://memgator.example.com/timemap/cdxj/https://odu.edu
HTTP/1.1 103 Early Hints
Link: <https://archive1.com/timemap/link/https://odu.edu; rel=timemap
Link: <https://archive2.org/timemap/link/https://odu.edu; rel=timemap
Link: <https://three.myarchive.net/web/timemap/cdxj/https://odu.edu; rel=timemap
...

In extending the aggregation concept, revealing the sources at query time would be useful. My anticipated use case might be to poll for the sources used, perhaps based on the then-current sleep status of each respective archive.

It is not an entirely well-formed notion but I thought I would put forth the idea for discussion.

ibnesayeed commented 3 years ago

I don't see it a good fit for the 103 Early Hints status code. The purpose of 103 to to tell the user-agent what other resources it might need for the proper realization the current resource that is yet to be fully served. For example, if a client requests for an HTML page that will need a few JS, CSS, and image files to render the page properly, but the client will only make those requests after processing the HTML and discovering them. However, the server application might already know the resource structure of the page, so it can give an early hint to the client to make request to fetch those resources even before the HTML response was delivered to the client.

If you want to know the sources being aggregated, you should check the recently introduced /about endpoint, it should have status of each upstream archive (such as if they are in the dormant state). It will be good to content negotiate on this endpoint and report this info in a more machine readable format like JSON, which is something we are tracking under #127. If you are interested in knowing which specific upstream archives responded with any good results (and not just the list of all configured archives), then we should put some energy in implementing #97.

machawk1 commented 3 years ago

In a similar vein, is there any way for a client that accesses the base URI of the aggregator to discover the /about (and other) endpoints? With this GitHub issue, I was hoping for a machine-readable way to obtain the information in /about. It is related to, but not identical, to #127, as the runtime state of accessing an archive might be different than the values in the archives.json.

ibnesayeed commented 3 years ago

In a similar vein, is there any way for a client that accesses the base URI of the aggregator to discover the /about (and other) endpoints?

Unfortunately, resources like TimeMap and TimeGate and their corresponding relation types are defined in the context of a given URI-R. I would have liked it if there were relations to advertise generic endpoints for these that web archives and aggregators can use to advertise their endpoints (and perhaps inheriting some ideas from OpenSearch for URI template) from their well-known entry point. I remember talking to @phonedude about it a while ago. He said, they discussed it in the context of Memento RFC, but did not include it in the final draft for some reasons. Perhaps @hvdsomp can tell us more about it.

Alternatively, we can also look at the Well-Known URIs to see if any existing registered entities can allow advertising these or should we consider proposing something new.

phonedude commented 3 years ago

There are other rel types that could be used, likely "describes" or "describedby" (depending on direction). Possibly even "profile", depending on how you set things up.

You could also use "related", though that's a pretty weak relation.

Or, you could create your own URI for the rel type if none of the existing were suitable.

https://www.iana.org/assignments/link-relations/link-relations.xhtml

hvdsomp commented 3 years ago

The\about thing looks like something that could be handled by host-meta, for which a well-known URI exists, see RFC6415. And maybe aggregated endpoints could be conveyed using host-wide links in the host-meta document that have the item link relation.

What’s cool about the host-meta approach is that it might also allow conveying templates for individual resources on the host, meaning conveying a template for TimeGates, TimeMaps, and Mementos. Although, I am pretty sure that the uri variable defined in the spec can not be used for this purpose since it stands for a URI on the host, not an external URI.