mitodl / micromasters

Portal for learners and course teams to access MITx Micromasters® programs
https://mm.mit.edu
BSD 3-Clause "New" or "Revised" License
30 stars 17 forks source link

discovery: implement circuit breaker pattern for calls to edX #455

Closed giocalitri closed 8 years ago

giocalitri commented 8 years ago

I believe we need to implement not only the circuit breaker pattern, but also introduce timeouts and retries in the calls to edx. All this should be implemented in the edx-api-client library that should return proper exceptions to be handled inside micromasters.

Part 1: Circuit breakers

Implementing the circuit breaker patter is pretty straight forward using pybreaker (see docs):

import pybreaker
api_breaker = pybreaker.CircuitBreaker(fail_max=5, reset_timeout=60)

then we can decorate the functions that make the actual calls without changing anything elseama:

    @api_breaker
    def get_student_certificates(self, username, course_ids=None):
        ... 

this will cause an usual error for the first 5 failed attempts and then a CircuitBreakerError in case the circuit breaker opens or for the trial calls. We should also mask the different exceptions that need to be raised.

The hard part of implementing this pattern is to figure out the right values for fail_max and reset_timeout. We should probably make some preventive monitoring on the calls to the different EDX API endpoints and then decide what can be acceptable for us an our users.

Something I have not decided yet is if it is better to create a circuit breaker per edx endpoint or a global one. Both solutions have pro/cons.

Part 2: Timeouts

Introducing timeouts in the edx-api-client is, again, pretty straight forward: we use the requests library that allows to specify a timeout for each call (see docs).

In our case an example of a possible implementation might be:

        resp = self.requester.get(
            urljoin(
                self.base_url,
                '/api/certificates/v0/certificates/{username}/courses/{course_key}/'.format(
                    username=username,
                    course_key=course_id
                )
            ),
            timeout=3  # seconds 
        )

Again, the hard part here is to figure out how much time to wait is acceptable for us and our users.

Part 3: Retries

Retries in the requests library are not really difficult to implement, given that we already use sessions in the edx-api-client (see docs):

>>> import requests
>>> s = requests.Session()
>>> a = requests.adapters.HTTPAdapter(max_retries=3)
>>> s.mount('http://', a)

If we want to limit the retries to specific HTTP errors, it is slightly more complicated (see this example).

For the reties it might be a bit complicated to figure out how to combine this parameter with the limit to open the circuit breaker, given that all the failed retry requests will count and one call might open the circuit breaker.