oduwsdl / MemGator

A Memento Aggregator CLI and Server in Go
https://memgator.cs.odu.edu/api.html
MIT License
55 stars 11 forks source link

Provide more considerate handling of network errors #53

Closed machawk1 closed 8 years ago

machawk1 commented 8 years ago

Per https://github.com/machawk1/wail/issues/236, the command

./memgator -a ./archives.json http://matkelly.com/wail

causes multiple very verbose errors reported from MemGator:

ERROR: 2015/12/20 18:29:39.634079 main.go:248: pastpages => Network error: Get http://www.pastpages.org/timemap/link/http://matkelly.com/wail: dial tcp: lookup www.pastpages.org on [::1]:53: read udp [::1]:60841->[::1]:53: read: connection refused
ERROR: 2015/12/20 18:29:39.635427 main.go:248: blarchive => Network error: Get http://www.webarchive.org.uk/wayback/archive/timemap/link/http://matkelly.com/wail: dial tcp: lookup www.webarchive.org.uk on [::1]:53: read udp [::1]:65224->[::1]:53: read: connection refused
ERROR: 2015/12/20 18:29:39.635405 main.go:248: archive.is => Network error: Get http://archive.today/timemap/http://matkelly.com/wail: dial tcp: lookup archive.today on [::1]:53: read udp [::1]:61731->[::1]:53: read: connection refused
ERROR: 2015/12/20 18:29:39.634207 main.go:248: proni => Network error: Get http://webarchive.proni.gov.uk/timemap/http://matkelly.com/wail: dial tcp: lookup webarchive.proni.gov.uk on [::1]:53: read udp [::1]:60842->[::1]:53: read: connection refused
ERROR: 2015/12/20 18:29:39.643340 main.go:248: loc => Network error: Get http://webarchive.loc.gov/all/timemap/link/http://matkelly.com/wail: dial tcp: lookup webarchive.loc.gov on [::1]:53: read udp [::1]:61169->[::1]:53: read: connection refused
ERROR: 2015/12/20 18:29:39.643520 main.go:248: ia => Network error: Get http://web.archive.org/web/timemap/link/http://matkelly.com/wail: dial tcp: lookup web.archive.org on [::1]:53: read udp [::1]:64563->[::1]:53: read: connection refused
ERROR: 2015/12/20 18:29:39.643893 main.go:248: ukparliament => Network error: Get http://webarchive.parliament.uk/timemap/http://matkelly.com/wail: dial tcp: lookup webarchive.parliament.uk on [::1]:53: read udp [::1]:64334->[::1]:53: read: connection refused
ERROR: 2015/12/20 18:29:39.644200 main.go:248: swa => Network error: Get https://swap.stanford.edu/timemap/link/http://matkelly.com/wail: dial tcp: lookup swap.stanford.edu on [::1]:53: read udp [::1]:64565->[::1]:53: read: connection refused
ERROR: 2015/12/20 18:29:39.644322 main.go:248: uknationalarchives => Network error: Get http://webarchive.nationalarchives.gov.uk/timemap/http://matkelly.com/wail: dial tcp: lookup webarchive.nationalarchives.gov.uk on [::1]:53: read udp [::1]:61051->[::1]:53: read: connection refused
ERROR: 2015/12/20 18:29:39.644413 main.go:248: is => Network error: Get http://wayback.vefsafn.is/wayback/timemap/link/http://matkelly.com/wail: dial tcp: lookup wayback.vefsafn.is on [::1]:53: read udp [::1]:61052->[::1]:53: read: connection refused
ERROR: 2015/12/20 18:29:39.644500 main.go:248: archiveit => Network error: Get http://wayback.archive-it.org/all/timemap/link/http://matkelly.com/wail: dial tcp: lookup wayback.archive-it.org on [::1]:53: read udp [::1]:64562->[::1]:53: read: connection refused

A better solution to prevent unnecessary queries to the unreachable archives (i.e., fail quick+hard) is to report that the network connection is likely problematic based on no archives being reachable (very unlikely that this would happen in real world scenarios without this cause).

I can replicate this with the current binary of MemGator 1.0-rc3 (as -v reported) on OS X 10.11.2.

machawk1 commented 8 years ago

Replicable with MemGator 1.0-rc3 OS X 10.11.2, see https://github.com/oduwsdl/memgator/issues/53

ibnesayeed commented 8 years ago

It will probably report 404. Connection to each individual archive is independent. Which means in order to report that there is some network issue on the aggregator machine, the accumulator needs to count the network failures and if the count is same as the number of archives requested then issue a special response.

I am not sure, how often something like this would occur in real world scenario. In your specific case, MemGator is being queried locally and outside network is turned off. In most cases though, the MemGator host itself would be accessed from the external network and will fail in the first place.

ikreymer commented 8 years ago

IMO this case would just be handled as a specific case of #43 .. If there is no network, all archives will fail and will no longer be tried again after some threshold.. If the network comes back, they would be tried again after some period of time.. I think it's just a specific case of 'archive failing consistently' but for all archives.

machawk1 commented 8 years ago

:+1: @ikreymer, special case of #43. Per the same ticket, if IA (e.g.) is down with high probability of being up, (and/or) a secondary archive is also down (i.e., zero/null mementos for a popular URI), retry or fail (depending on execution mode).

machawk1 commented 8 years ago

Still getting the same level of verbosity with the latest src:

$ memgator -v
MemGator 1.0-rc4
$ memgator -a archives.json http://matkelly.com/wail
ERROR: 2016/05/02 10:28:49.191647 main.go:259: loc => Network error: Get http://webarchive.loc.gov/all/timemap/link/http://matkelly.com/wail: dial tcp: lookup webarchive.loc.gov: no such host
ERROR: 2016/05/02 10:28:49.191749 main.go:259: blarchive => Network error: Get http://www.webarchive.org.uk/wayback/archive/timemap/link/http://matkelly.com/wail: dial tcp: lookup www.webarchive.org.uk: no such host
ERROR: 2016/05/02 10:28:49.191828 main.go:259: ia => Network error: Get http://web.archive.org/web/timemap/link/http://matkelly.com/wail: dial tcp: lookup web.archive.org: no such host
ERROR: 2016/05/02 10:28:49.191859 main.go:259: is => Network error: Get http://wayback.vefsafn.is/wayback/timemap/link/http://matkelly.com/wail: dial tcp: lookup wayback.vefsafn.is: no such host
ERROR: 2016/05/02 10:28:49.191885 main.go:259: ukparliament => Network error: Get http://webarchive.parliament.uk/timemap/http://matkelly.com/wail: dial tcp: lookup webarchive.parliament.uk: no such host
ERROR: 2016/05/02 10:28:49.191907 main.go:259: proni => Network error: Get http://webarchive.proni.gov.uk/timemap/http://matkelly.com/wail: dial tcp: lookup webarchive.proni.gov.uk: no such host
ERROR: 2016/05/02 10:28:49.600347 main.go:259: archive.is => Network error: Get http://archive.today/timemap/http://matkelly.com/wail: dial tcp: lookup archive.today: no such host
ERROR: 2016/05/02 10:28:49.600381 main.go:259: uknationalarchives => Network error: Get http://webarchive.nationalarchives.gov.uk/timemap/http://matkelly.com/wail: dial tcp: lookup webarchive.nationalarchives.gov.uk: no such host
ERROR: 2016/05/02 10:28:49.600396 main.go:259: swa => Network error: Get https://swap.stanford.edu/timemap/link/http://matkelly.com/wail: dial tcp: lookup swap.stanford.edu: no such host
ERROR: 2016/05/02 10:28:49.600408 main.go:259: archiveit => Network error: Get http://wayback.archive-it.org/all/timemap/link/http://matkelly.com/wail: dial tcp: lookup wayback.archive-it.org: no such host

It would be nice to indicate that there is a larger overlying problem with the network connection. The first clue should be that the archives.json file that memgator pulls from the web is inaccessible, mitigated here by specifying a local copy. When this local config is not referenced:

$ memgator http://matkelly.com/wail
Error reading list of archives (http://oduwsdl.github.io/memgator/archives.json): Get http://oduwsdl.github.io/memgator/archives.json: dial tcp: lookup oduwsdl.github.io: no such host
ibnesayeed commented 8 years ago

As noted by @ikreymer above, the new adaptive system will take care of it when enabled. In my opinion these errors are reporting the right thing, their repetition is not an issue as they would repeatedly log success as well. Introducing more specific errors will require more especial case checks as well as it would introduce more complexities in the log visualizers (if we make one).

In my opinion, the archive list file should preferably be local in production environments. Hence checking that and declaring a complete network downtime is not wise. Additionally, that file is loaded in the beginning, not with each request. Dynamic archive lists might introduce another level of complexity in the mix. Also, network connectivity can be up or down independent of the MemGator process.

Having said that, I am open to implement this if there are some clear benefits and enough use cases that surpass the complexities it will bring to the system.