usdigitalresponse / cpf-reporter

CPF Reporter application maintained by the USDR Grants program.
Apache License 2.0
0 stars 4 forks source link

Agency Report Submission - Pydantic Validation - Prototype code cleanup #145

Open as1729 opened 7 months ago

as1729 commented 7 months ago

Why is this issue important?

See User Story here https://github.com/usdigitalresponse/cpf-reporter/issues/73

This is an issue in the sequence of tickets to ensure the CPF reporter can accept a user-submitted upload file and receive feedback on any errors that are present in the submitted file data.

Current State

There exists pydantic schema and prototype code that can load an excel file from a local machine and generate any errors.

This is the starting point: https://github.com/usdigitalresponse/cpf-reporter/pull/140/files#diff-b4bb27ac77a4277d0f1dd4db4e158e07418c6dff86bee251c51c1891e2c21ec9

Expected State

Take this existing schema and code and translate it into:

  1. Easier to test modules/functions
  2. Corresponding unit tests

Implementation Plan

Here is the link to the code that needs to be refactored and tested: https://github.com/usdigitalresponse/cpf-reporter/pull/140/files#diff-b4bb27ac77a4277d0f1dd4db4e158e07418c6dff86bee251c51c1891e2c21ec9

"""
The below code can be used as a starting point for the refactor. The tests for this can be added in `python/src/tests/lib/workbook_validation_test.py`
"""

from io import BytesIO

# Accepts a workbook from openpyxl
def validate_logic_sheet(logic_sheet: Any):
    errors = []
    try:
        # Cell B1 contains the version
        LogicSheetVersion(**{ "version": logic_sheet['B1'].value })
    except ValidationError as e:
        errors.append(f'Logic Sheet: Invalid {e}')
    return errors

def validate_cover_sheet(cover_sheet) -> Tuple[Optional[List[str]], Optional[Any]]:
    project_schema = None
    cover_header = get_headers(cover_sheet, 'A1:B1')
    cover_row = cover_sheet[2]
    row_dict = map_values_to_headers(cover_header, cover_row)
    try:
        CoverSheetRow(**row_dict)
    except ValidationError as e:
        errors.append(f"Cover Sheet: Invalid {e}")
        return (errors, None)

        # This does not need to be a silent failure. This would be a critical error.
    project_schema = SCHEMA_BY_PROJECT[row_dict['Project Use Code']]
    return (None, project_schema)

def validate_project_sheet(project_sheet) -> List[str]:
    errors = []
    project_headers = get_headers(project_sheet, 'C3:DS3')
    current_row = 12
    for project_row in project_sheet.iter_rows(min_row=13, min_col= 3, max_col=123, values_only=True):
        current_row += 1
        if is_empty_row(project_row):
            continue
        # print(project_row)
        row_dict = map_values_to_headers(project_headers, project_row)
        # print(row_dict['Capital_Asset_Ownership_Type__c'])
        try:
            project_schema(**row_dict)
        except ValidationError as e:
            errors.append(f"Project Sheet: Row Num {current_row} Error: {e}")
     return errors

def validate_subrecipient_sheet(subrecipient_sheet) -> List[str]:
    errors = []
    subrecipient_headers = get_headers(subrecipient_sheet, 'C3:O3')
    current_row = 12
    for subrecipient_row in subrecipient_sheet.iter_rows(min_row=13, min_col=3, max_col=16, values_only=True):
        current_row += 1
        if is_empty_row(project_row):
            continue
        row_dict = map_values_to_headers(subrecipient_headers, subrecipient_row)
        try:
            SubrecipientRow(**row_dict)
        except ValidationError as e:
            errors.append(f"Subrecipients Sheet: Row Num {current_row} Error: {e}")
    return errors

def validate(workbook: bytes):
    """
    1. Load workbook in read-only mode
    See: https://openpyxl.readthedocs.io/en/stable/optimized.html
    If there is any trouble loading files from memory please see here: https://stackoverflow.com/questions/20635778/using-openpyxl-to-read-file-from-memory
    """
    wb = load_workbook(filename=BytesIO(workbook), read_only=True)

    """
    2. Validate logic sheet to make sure the sheet has an appropriate version
    """
    errors = validate_logic_sheet(wb['Logic'])
    if errors.length > 0:
         return errors

    """
    3. Validate cover sheet and project selection. Pick the appropriate validator for the next step.
    """
    errors, project_schema = validate_cover_sheet(wb['Cover'])
    if errors.length > 0:
          return errors

    """
    4. Ensure all project rows are validated with the schema
    """
    project_errors = validate_project_sheet(wb['Project'])

    """
    5. Ensure all subrecipient rows are validated with the schema
    """
    subrecipient_errors = validate_subrecipient_sheet(wb['Subrecipients'])

    # Close the workbook after reading
    wb.close()

    return project_errors + subrecipient_errors

Relevant Code Snippets

No response

jakekreider commented 7 months ago

@as1729 Will see about getting a test file so I can write tests, but is this what you had in mind for refactoring? 5f5a1b789018d32c80324dc9432620c7f23fb1f8 Should be mostly the same from what you have above, just making sure I'm doing what's expected here.

as1729 commented 7 months ago

yup code looks well-organized. When I added the typings I wasn't sure if it was correct so I would double-check those pieces as well. I also didn't add return-types for some of these functions which would be good to add.