sed-group / sed-backend

SED Lab back-end API
0 stars 3 forks source link

CSV simulation DB call optimization #79

Closed johnmartins closed 1 year ago

johnmartins commented 1 year ago

https://github.com/sed-group/sed-backend/blob/3c7f1acddaf5c49319204153702d5dcf2e74cb1f/sedbackend/apps/cvs/simulation/implementation.py#L147-L148

@obennet @EppChops the run_sim_monte_carlo function executes code that has (at least) triple-nested for-loops with DB calls inside of them. The function called in these lines of codes contains one additional hidden for-loop:

https://github.com/sed-group/sed-backend/blob/3c7f1acddaf5c49319204153702d5dcf2e74cb1f/sedbackend/apps/cvs/simulation/storage.py#L200-L213

This needs to be resolved, as it completely tanks DB performance. In general, avoid any DB calls inside for-loops. This is usually possible by building larger SQL statements. If you are aware of any other such places in the code, then please prioritize fixing it.

johnmartins commented 1 year ago

Another place which seems to be causing a lot of traffic:

https://github.com/sed-group/sed-backend/blame/86467d9e8e8e5f06549cfbf3c4fef90bc3a3f634/sedbackend/apps/cvs/link_design_lifecycle/storage.py#L97-L128

populate_formula is run within a for loop, but the funciton itself contains two selector db-calls. Breaking these out into one or two calls that collect everything at once would significantly reduce traffic.

def get_all_formulas(db_connection: PooledMySQLConnection, project_id: int, vcs_id: int,
                     design_group_id: int) -> List[models.FormulaRowGet]:
    logger.debug(f'Fetching all formulas with vcs_id={vcs_id}')

    get_design_group(db_connection, project_id, design_group_id)  # Check if design group exists and matches project
    get_cvs_project(project_id)
    get_vcs(project_id, vcs_id)

    select_statement = MySQLStatementBuilder(db_connection)
    res = select_statement.select(CVS_FORMULAS_TABLE, CVS_FORMULAS_COLUMNS) \
        .inner_join('cvs_vcs_rows', 'vcs_row = cvs_vcs_rows.id') \
        .where('vcs = %s and design_group = %s', [vcs_id, design_group_id]) \
        .execute(fetch_type=FetchType.FETCH_ALL, dictionary=True)

    if res is None:
        raise exceptions.VCSNotFoundException

    return [populate_formula(r) for r in res]

def populate_formula(db_result) -> models.FormulaRowGet:
    return models.FormulaRowGet(
        vcs_row_id=db_result['vcs_row'],
        design_group_id=db_result['design_group'],
        time=db_result['time'],
        time_unit=db_result['time_unit'],
        cost=db_result['cost'],
        revenue=db_result['revenue'],
        rate=db_result['rate'],
        market_inputs=market_impl.get_all_formula_market_inputs(db_result['vcs_row']),
        value_drivers=design_impl.get_all_formula_value_drivers(db_result['vcs_row'])
    )