oduwsdl / MemGator

A Memento Aggregator CLI and Server in Go
https://memgator.cs.odu.edu/api.html
MIT License
56 stars 11 forks source link

Customizing TimeMap formats served #122

Open machawk1 opened 4 years ago

machawk1 commented 4 years ago

In server mode, MemGator serves 3 formats of TimeMaps: Link, CDXJ, and JSON.

In the vein of #116, I would like to be able to restrict the formats that my MemGator binary serves when in server mode.

Because most web archives that are Memento compliant only serve Link formatted TimeMaps, I would like impose similar restrictions at the aggregator level using MemGator (without touching the code).

ibnesayeed commented 4 years ago

While it is doable by introducing yet another flag and some logic changes, I am not sure if that added complexity would be worth the effort, unless you have a convincing case.

machawk1 commented 4 years ago

What are your thoughts on reusing the -f flag to have this functionality in server mode, @ibnesayeed? Currently it looks to be ignored when run in server mode.

My use case stems from the aggregators-of-aggregator (dare I say "meta aggregator" ;) ) concept and the need to pull from well-formatted known sources when the ideal source is not available. From what I recall, MemGator expects a Link-formatted TimeMap to be available, but it would be interesting to promote each format to a first-class rather than derived format.

For example, if a MemGator that serves all three formats were to query a differently configured MemGator (with regard to # of archives and formats of TMs served), if the latter did not serve in Link, would the former still be able to use it as a source?

ibnesayeed commented 4 years ago

What are your thoughts on reusing the -f flag to have this functionality in server mode, @ibnesayeed? Currently it looks to be ignored when run in server mode.

The flag can be reused, but it will need clear documentation to distinguish the behavior difference in one-off mode and server. However, the increased complexity of parsing comma-separated list and validating when it should have only one value, then plugging all the logic will be a mess.

For the use case you are describing, I would say it would be more dangerous to allow customization in what formats to serve responses in. For now, we know for sure that if a MemGator instance is running, it must be returning Link formatted TimeMap, despite what other formats it might support. MemGator is programmed to read only Link format as the common ground. Many output formats are there to be utilized according to specific use case where one format might feel a better fit. However, for the sake of interoperability, standard Link format is always going to be there. On the contrary, if we allow customization and someone chooses to only return CDXJ format, then any secondary aggregator will be out of luck to use it as an upstream endpoint. Even if the secondary aggregator understands how to parse other formats (say, when #116 is implemented), it has to switch parser depending on the content-type or some sort of content-sniffing would be at play. I do not think going that route is solving any problem that we have, but cause more issues.