oduwsdl / MemGator

A Memento Aggregator CLI and Server in Go
https://memgator.cs.odu.edu/api.html
MIT License
56 stars 11 forks source link

Use domain name as the archival identifier in archives list #105

Closed ibnesayeed closed 4 years ago

ibnesayeed commented 6 years ago

The JSON file that contains the list of archives uses short terms like ia or ukwa as IDs of each archival source. We can use their domain names in that place instead. This does not require any code change, only the config file needs to be changed.

machawk1 commented 6 years ago

I agree that an archive's identifier ought to be its URI but having a short id like it is now is handy. What are you thoughts on adding some other field like shortname or something?

ibnesayeed commented 6 years ago

Having just the domain names (not the full homepage URI) is not too verbose while being more predictable than randomly short chosen IDs. Adding other optional meta fields should be fine.

ikreymer commented 6 years ago

Any interest in coordinating with the names used in https://github.com/webrecorder/public-web-archives for short identifiers? We thought about using urls as main identifiers, but unfortunately, they too can change (for example, move from http->https) especially for smaller archives.

ibnesayeed commented 6 years ago

@ikreymer thanks for the reference. That is certainly one project that is still in my inbox waiting for me to reply to that. While I was creating this ticket, I had that repo in mind.

That said, here I am proposing to use just the canonical domain name, so things like protocol changes or presence of www won't affect it. However, this approach is not safe either, for example when a single domain is serving multiple archive under different path prefixes. Again, that is not a big issue, one can use example.com/foo and example.com/bar as identifiers. These configuration files are often handcrafted anyway.