oduwsdl / MemGator

A Memento Aggregator CLI and Server in Go
https://memgator.cs.odu.edu/api.html
MIT License
55 stars 11 forks source link

Decouple default archives list from online source #88

Open machawk1 opened 7 years ago

machawk1 commented 7 years ago

MemGator currently looks to http://git.io/archives on startup by default. If git.io goes down, MemGator has no default list of archives. Coupling a local service's functionality to a remote online resource is bad. MemGator ought to work with smart defaults without relying on this resource.

INFO: 2016/09/08 22:02:01.250427 Initializing MemGator:1.0-rc5...
INFO: 2016/09/08 22:02:01.250524 Loading archives from http://git.io/archives
FATAL: main.go:831: Error reading list of archives (http://git.io/archives): Get http://git.io/archives: dial tcp: lookup git.io: no such host
ibnesayeed commented 7 years ago

I thought about it earlier when MemGator was born and concluded that the current approach is the most practical, perhaps not ideal though. The other two approaches I thought about are following:

For advanced users, it is almost always better to use their custom or local archives file and not rely on an external curated list of archives that might go down. Luckily, this service is hit only once on the startup of the tool then it caches the list of archives in the memory for the entire session. Additionally, a failure to read the curated list file results in the fatal error with precise message to explain what went wrong.

That said, do you have any other mechanism that might work better in this case, please feel free to propose.

machawk1 commented 7 years ago

Not the best solution, but to mitigate the effect that git.io has on MemGator instances, would it be possible to consult a second or even tertiary source redundant of the information at git.io?

ibnesayeed commented 7 years ago

That is doable, but it would cause a sync overhead in which we will have to find a few distinct hosts where we can keep the copies of the curated list of archives and be able to update the content without changing the URI (e.g., Gist wont work here). The current source is part of the repository, hence it's easy for anyone from our team to update that, another hosting service might not be that easy for all of us to have write access to. The other thing that needs to be considered is to not try other sources if the --archives flag is explicitly set by the user, even if the custom value is the same as default.

machawk1 commented 7 years ago

https://github.com/jteeuwen/go-bindata might help with this, see http://rachbelaid.com/embedding-assets-in-go-project/ . I think having the JSON data built into the binary at compile time is a good, safe, default instead of having many people's binaries relying on an online file that you can manipulate.

ibnesayeed commented 7 years ago

I have considered embedding the default list inside the code (we don't even need binary data embedding for that), but shipping default data has it's own implications which I described above.