Closed ibnesayeed closed 8 years ago
Another idea would be to short-circuit when a threshold is reached derived from your probabilities of the URI being in each archive.
Another idea would be to short-circuit when a threshold is reached derived from your probabilities of the URI being in each archive.
For that there is top-K archives configuration in place. That K could be dynamic, but will have to think about the heuristics around that.
Just saw this now, made a similar comment in #42 :) +1 Some sort of timeout/health system is definitely needed, so that slower archives can be included w/o affecting overall performance.
This feature is now implemented. This means one culprit will no longer be able to drag every healthy archive in case of longer timeouts (that are necessary for heavily archived resources). Yay! :+1:
Current implementation introduces two new flags. The -F, --tolerance
flag sets the number of consecutive failures of any archive that triggers the hibernation for that archive. When the value is set to -1 (default), the automatic hibernation is disabled. The -d, --dormant
flag sets the period for how long the archive will remain dormant before it becomes active again after the beginning of the hibernation (default 15 minutes).
When an archive is active again after being dormant or if it responds good after a few failures, but before reaching the failure tolerance threshold, its failure count is reset to zero.
Currently, the decision is made based on the archive for the sake of simplicity of reporting, but there might be cases where only one of the TimeMap or TimeGate endpoints of an archive is misbehaving and the other is healthy. If there are enough such cases, we can alter the implementation to hibernate endpoint URIs rather than archives.
/cc @N0taN3rd regarding his work in setting up a Memento test corpus -- the case where an archive is down should be considered as a test pattern.
/cc @N0taN3rd regarding his work in setting up a Memento test corpus -- the case where an archive is down should be considered as a test pattern.
And one way to create test cases for this would be to allow a custom header/response delay parameter in the URL of the test mock service.
An alternate approach to improve the user experience described in #42 would be to automatically detect and pause for
T
time period if an upstream service is failing and hitsK
timeouts in consecutive requests.