Closed ibnesayeed closed 4 years ago
I agree that an archive's identifier ought to be its URI but having a short id like it is now is handy. What are you thoughts on adding some other field like shortname
or something?
Having just the domain names (not the full homepage URI) is not too verbose while being more predictable than randomly short chosen IDs. Adding other optional meta fields should be fine.
Any interest in coordinating with the names used in https://github.com/webrecorder/public-web-archives for short identifiers? We thought about using urls as main identifiers, but unfortunately, they too can change (for example, move from http->https) especially for smaller archives.
@ikreymer thanks for the reference. That is certainly one project that is still in my inbox waiting for me to reply to that. While I was creating this ticket, I had that repo in mind.
That said, here I am proposing to use just the canonical domain name, so things like protocol changes or presence of www
won't affect it. However, this approach is not safe either, for example when a single domain is serving multiple archive under different path prefixes. Again, that is not a big issue, one can use example.com/foo
and example.com/bar
as identifiers. These configuration files are often handcrafted anyway.
The JSON file that contains the list of archives uses short terms like
ia
orukwa
as IDs of each archival source. We can use their domain names in that place instead. This does not require any code change, only the config file needs to be changed.