hackforla / HomeUniteUs

We're working with community non-profits who have a Host Home or empty bedrooms initiative to develop a workflow management tool to make the process scalable (across all providers), reduce institutional bias, and effectively capture data.
https://homeunite.us/
GNU General Public License v2.0
39 stars 21 forks source link

API Error Design in Production vs Development Environments #828

Open johnwroge opened 1 week ago

johnwroge commented 1 week ago

This is a current action item in Porting Flask to FastAPI #789.

Overview

Currently, API handles errors for production and development environments in the same way. This issue is meant to be a point of discussion to track how the team wants to keep a consistent API response for production and development environments while also bringing up logging as a potential solution to centralize development errors.

Research

The production API should be limiting information sent to the client for security purposes. Sending specific server errors and response codes to the client can open the system to vulnerabilities by exposing detailed information to bad actors. At the same time, using specific error codes with detailed messages for exceptions/errors allows for developers to maintain the existing system more efficiently. While the current MVP system is not completely functional, agreeing on how these errors can be propagated to the client will help mitigate some of the complexity as the backend continues to scale and more developers are added to the team.

One approach, is to distinguish different environments using environment variables while maintaining a consistent error response for the production environment.

Example Error Response for Production:

raise HTTPException(
    status_code=500,
    detail={
        "code": "INTERNAL_SERVER_ERROR",
        "message": "An unexpected error occurred. Please try again later."
    }

This is a basic example of how this would look in the controllers that return specific error messages.


except Exception as e:
    if ENVIRONMENT == "production":
        # Optionally log the error with detailed information on the server side
        logger.error("An error occurred: %s", e, exc_info=True)

        # Send a generic error message to the frontend
        raise HTTPException(
            status_code=500,
            detail={
                "code": "INTERNAL_SERVER_ERROR",
                "message": "An unexpected error occurred. Please try again later."
            }
        )
    else:
        # Optionally log detailed error info for debugging in development
        logger.debug("An error occurred: %s", e, exc_info=True)

        # Send the actual error details to the frontend for easier debugging 
        raise HTTPException(
            status_code=500,
            detail={
                "code": type(e).__name__,
                "message": str(e)
            }
        )

Logging (Optional)

Including a basic logging setup will help centralize server side errors for developers as well. These logs can be visible in development and also accessible through EC2 console. Since the MVP of the application is not complete and the number of users will be limited this may not be necessary at the moment, but would be a nice to have in the future.

Logging Resources

Python Logging How To Python Logging API Logging levels

Action Items

Optional Logging Items

paulespinosa commented 1 week ago

@johnwroge This is great! Thanks for documenting how we can approach errors.

For clarification, is the research done here and our decisions focused only on server errors ("HTTP Status 5xx") or will it also cover "domain"/client errors.

For example:

Does FastAPI have a global exception handler that can check for server errors and perform the same environment logic in the example code snippet? Or, would we have to include this code in each request handler?

johnwroge commented 1 week ago

Thanks @paulespinosa! Yes, those are all really good points and I should have been more explicit in those cases.

We would be able to use FastAPI's built in exception handler. We can register this globally for all the routes in the main.py file after the other routers.

In main.py

from global_error_module import global_exception_handler

app = FastAPI(lifespan=lifespan)
app.include_router(api_router, prefix="/api")
app.include_router(health_router, prefix="/api/health", tags=["health"])

app.add_exception_handler(Exception, global_exception_handler)

Exception Handler

I think integrating a global exception handler that handles both client and server errors would be ideal because it would centralize how errors are handled. We could use guard clauses to check the environment in this section and then depending on the status code respond with limited information in production or the regular response in development. This message information when the exception is raised could be passed to the exception handler in the endpoints so we don't need to write out the logic above. I'll post an example below.


# Function to get the current environment
def get_environment():
    return os.getenv("ENVIRONMENT", "development")

def configure_logging():
    environment = get_environment()
    log_level = logging.DEBUG if environment == "development" else logging.ERROR
    logging.basicConfig(
        level=log_level,
        format="%(asctime)s [%(levelname)s] %(message)s",
        handlers=[
            logging.StreamHandler()
        ]
    )

configure_logging()

@app.exception_handler(Exception)
async def global_exception_handler(request: Request, exc: Exception):

    environment = get_environment()
    status_code = 500
    error_response = {
        "error": "Internal Server Error"
    }
# we should send a more generic message so it doesn't say anything is wrong. 
# We are unable to respond to your request at this time

    # Handle HTTPException differently based on environment
   if isinstance(exc, HTTPException):
        status_code = exc.status_code
        if environment == "production":
            error_response = {
                "error": "An error occurred. Please try again later."
            }
            logging.error(f"HTTPException occurred: {exc.detail}")
        else:
            error_response = {
                "error": exc.detail
            }
            logging.debug(f"HTTPException occurred: {exc.detail}")
    else:
        # Customize the response based on the environment
        if environment == "production":
            if status_code == 401:
                error_response = {
                    "error": "You are not authorized to perform this action."
                }
            elif status_code == 403:
                error_response = {
                    "error": "You are not allowed to access this resource."
                }
            elif status_code == 404:
                error_response = {
                    "error": "The requested resource was not found."
                }
            elif status_code >= 500:
                error_response = {
                    "error": "An internal server error occurred. Please try again later."
                }
            # Return generic error responses for all other cases in production
            elif 400 <= status_code < 500:
                status_code = 400
                error_response = {
                    "error": "Bad request. Please check your input and try again."
                }
             else # do something else here in case nothing else is invoked. 

        else:
            # Return more detailed error responses for development/staging
            if hasattr(exc, "detail"):
                error_response = {
                    "error": str(exc.detail)
                }
            else:
                error_response = {
                    "error": str(exc)
                }
           logging.debug(f"Exception occurred: {exc}")

    return JSONResponse(status_code=status_code, content=error_response)

There are a few cons in this approach, one being the global exception handler may need to be updated for every case which means this function could become very large and more difficult to manage for error specific handling. The other downside is if we were to implement logs in this file, it could result in latency for the application every time an exception is raised.

Do you think it would be better if we only use the global error handler in certain cases and handle the rest inside of the individual functions? All the other projects I've worked on used a global error handler in all cases, but there may be a better approach.

lola3736 commented 1 week ago

@johnwroge please add the applicable points label for this issue, let me know if you have any questions

lola3736 commented 1 week ago

@johnwroge thanks for adding the points label. Confirming that you are currently working on this issue and if so, can move to "in progress". Do you need to have this issue reviewed and approved by Dev lead?

johnwroge commented 1 week ago

Thanks @lola3736. I moved this to in progress. I wrote this for documentation purposes and I will update the points once we know more detailed action items and how much time it will take to implement.