usdigitalresponse / cpf-reporter

CPF Reporter application maintained by the USDR Grants program.
Apache License 2.0
0 stars 3 forks source link

Subrecipient Validation - Ensure all projects have one valid subrecipient #294

Open as1729 opened 3 months ago

as1729 commented 3 months ago

Current State

Users are able to submit workbooks with projects that do not map to a subrecipient.

Expected State

When user adds a project, there must always be a valid subrecipient that has the same UEI/TIN information. They must receive an error if no subrecipient is found with the UEI/TIN data found in the projects tab.

Definition of Done

If the subrecipient is entered in the project sheet, it must be entered in the subrecipient sheet (i.e. UEI & TIN on project record must be an exact match to UEI & TIN entered for a subrecipient record on the Subrecipient sheet)

Succesful validation:

Unsuccessful validation:

image

TEST FILES Pass & Fail test files can be found here CPF Input Template v20240524_UEITIN_PASS.xlsm has one project that has a matching UEI & TIN in the subrecipient tab CPF Input Template v20240524_UEITIN_FAIL.xlsm has three projects that fail for different reasons

Implementation Plan

  1. Modify existing validation functions to return the projects and subrecipients like so:
    
    type Subrecipients = List[SubrecipientRow]

This function now returns the "valid" subrecipients

def validate_subrecipient_sheet(subrecipient_sheet: Worksheet) -> Tuple(Errors, Subrecipients): errors = [] subrecipients = [] subrecipient_headers = get_headers(subrecipient_sheet, "C3:O3") current_row = INITIAL_STARTING_ROW for subrecipient_row in subrecipient_sheet.iter_rows( min_row=13, min_col=3, max_col=16, values_only=True ): current_row += 1 if is_empty_row(subrecipient_row): continue row_dict = map_values_to_headers(subrecipient_headers, subrecipient_row) try: subrecipients.append(SubrecipientRow(**row_dict)) except ValidationError as e: errors += get_workbook_errors_for_row( SubrecipientRow, e, current_row, SUBRECIPIENTS_SHEET ) return (errors, subrecipients)


```py
type Projects = List[Union[Project1ARow, Project1BRow, Project1CRow]]

# This function now returns any valid projects
def validate_project_sheet(project_sheet: Worksheet, project_schema: Type[Union[Project1ARow, Project1BRow, Project1CRow]]) -> Tuple[Errors, Projects]:
    errors = []
    projects = []
    project_headers = get_headers(project_sheet, "C3:DS3")
    current_row = INITIAL_STARTING_ROW
    sheet_has_data = False
    for project_row in project_sheet.iter_rows(
        min_row=13, min_col=3, max_col=123, values_only=True
    ):
        current_row += 1
        if is_empty_row(project_row):
            continue
        sheet_has_data = True
        row_dict = map_values_to_headers(project_headers, project_row)
        try:
            # Note: we will need to set row number here on the project in order to display a meaningful error message.
            #           this will require updating `BaseProjectRow` in `schemas/` to have this new optional attribute.
            row_dict["Row Number"] = current_row

            project = project_schema(**row_dict)
            projects.append(project_schema(**row_dict))
        except ValidationError as e:
            errors += get_workbook_errors_for_row(
                project_schema, e, current_row, PROJECT_SHEET
            )

    if not sheet_has_data:
        errors += [WorkbookError(
            message="Upload doesn’t include any project records.",
            row=INITIAL_STARTING_ROW,
            col=0,
            tab=PROJECT_SHEET,
            field_name="",
            severity=ErrorLevel.ERR.name,
        )]

    return (errors, projects)
class BaseProjectRow(BaseModel):
    model_config = ConfigDict(coerce_numbers_to_str=True, loc_by_alias=False)

    row_num: conint(ge=1) = Field(default=1, serialization_alias="Row Number", json_schema_extra={"column":"NONE"})
    ...
  1. Add a new function to validate projects and subrecipients together:

    def validate_projects_subrecipients(projects: Projects, subrecipients: Subrecipients) -> Errors:
    errors = []
    subrecipients_by_uei_tin = {}
    for subrecipient in subrecipients:
        subrecipients_by_uei_tin[(subrecipient.EIN__c, subrecipient.Unique_Entity_Identifier__c)] = subrecipient
    
    for project in projects:
        if subrecipients_by_uei_tin.get((project.Subrecipient_TIN__c, project.Subrecipient_UEI__c)) is None:
            col_name_tin = project.__class__.model_fields["Subrecipient_TIN__c"].json_schema_extra['column']
            col_name_uei = project.__class__.model_fields["Subrecipient_UEI__c"].json_schema_extra['column']
            errors.append(
                WorkbookError(
                    message="You must submit a subrecipient record with the same UEI & TIN numbers entered for this project",
                    row=project.row_num, # this currently does not exist you will need to find a way to get this.
                    col=f"{col_name_tin}, {col_name_uei}",
                    tab="projects",
                    field_name="Subrecipient_TIN__c and Subrecipient_UEI__c",
                    severity=ErrorLevel.ERR.name,
                )
            )
    return errors
  2. Use the returned information to do the additional validation here: https://github.com/usdigitalresponse/cpf-reporter/blob/main/python/src/lib/workbook_validator.py#L218:

    def validate_workbook(workbook: Workbook) -> Tuple[Errors, Optional[str]]:
    ...
    
    """
    3. Ensure all project rows are validated with the schema
    """
    if project_schema:
        project_errors, projects = validate_project_sheet(workbook[PROJECT_SHEET], project_schema)
        errors += project_errors
    
    """
    4. Ensure all subrecipient rows are validated with the schema
    """
    subrecipient_errors, subrecipients = validate_subrecipient_sheet(workbook[SUBRECIPIENTS_SHEET])
    errors += subrecipient_errors
    
    """
    5. Ensure all projects are mapped to a valid subrecipient
    """
    errors += validate_projects_subrecipients(projects, subrecipients)
    
    return (errors, project_use_code)
as1729 commented 3 months ago

@ClaireValdivia are you able to provide here what should go in each of the cells of this table when we identify a project that does not have a valid subrecipient? Since this is a cross-sheet validation I wasn't sure what row/col/sheet-name to use. If you're able to add-in the exact error verbiage then this ticket should be ready for work.

Image

ClaireValdivia commented 3 months ago

@as1729 just updated the definition of done section, lmk if you have any questions! referencing two columns is not our typical behavior but wondering if this feels feasible as I'm not sure of a better way to reference the error.

as1729 commented 3 months ago

That looks good to me 👍 thank you!

vshia commented 3 months ago

Is there a difference between EIN and TIN? Is EIN just legacy (and if so, should we change it to TIN)?

ClaireValdivia commented 3 months ago

@vshia EIN and TIN are referring to the same field, but unfortunately Treasury has named the field "EIN" in some places, so we need to be consistent with their naming conventions.

For context, we don't want to get out of sync with naming of fields that are provided by Treasury and are included in the report we send to Treasury. The reason we were able to change a field name recently (Project Use Code > EC Code in the input template Cover sheet) is because that is a field name in our input template only and not provided by Treasury.

ClaireValdivia commented 3 months ago

@vshia seeing a couple issues here for this test upload. For this file, there are three project records

Image

vshia commented 2 months ago

https://github.com/usdigitalresponse/cpf-reporter/pull/314 will address this

ClaireValdivia commented 2 months ago

This is looking good to me in staging!