oss-aspen / 8Knot

Dash app in development to serve open source community visualizations using GitHub data from Augur. Hosted app: https://eightknot.osci.io
MIT License
47 stars 59 forks source link

dependency pie chart error handling #640

Closed cdolfi closed 4 months ago

cdolfi commented 5 months ago

To fix for the error listed I decided to go ahead and work on all potential edge cases. If you think of a better way to do this lmk

JamesKunstle commented 4 months ago

@cdolfi wait what did the error mean, and why does this fix it? I'm kinda lost without doing a deep dive on the problem

cdolfi commented 4 months ago

@JamesKunstle So pretty much the error you noted happened bc there is no packages that are out of date so the set intervals used before break. I could (and started with) solving for the direct case where all of the packages are up to date, but I realized there was multiple different cases where this would break. Pretty much any scenario where there are no packages more than a year out of date. Lmk if its still confusing, bc its a little abstract/complicated

JamesKunstle commented 4 months ago

@cdolfi Ah okay, I'll look again with this understanding

JamesKunstle commented 4 months ago

@cdolfi I think the cleanest way to do this is actually in SQL to avoid Pandas-related bugs. lmk what you think

with last_collections_cte as (
    -- cte: For each repo, calculate the most recent collection date.
    select
        rdl.repo_id,
        max(rdl.data_collection_date) as latest_collection_date
    from
        augur_data.repo_deps_libyear rdl
    group by rdl.repo_id
)
select 
    *,
    case -- categorize the dependency out-of-date-edness;
            WHEN rdl.libyear >= 1.0 THEN '12 months or more' -- 1.0 is one year
            WHEN rdl.libyear < 1.0 and rdl.libyear >= 0.5 THEN '<12 months, >=6 months' -- 0.5 = 1/2 year, 6mos
            when rdl.libyear < 0.5 and rdl.libyear >= 0.0833 then '<6 months, >=1 month' -- 0.0833rep = 1/12 year, 1mos
            when rdl.libyear < 0.0833 and rdl.libyear > 0 then '<1 month'
            when rdl.libyear = 0 then 'Up to date'
            else 'Unclear version history'
    END as dep_age
from augur_data.repo_deps_libyear rdl,
    last_collections_cte lc
where 
    rdl.repo_id in (36113) -- augur repo_id
    and rdl.libyear >= 0
    and lc.repo_id = rdl.repo_id 
    and lc.latest_collection_date = rdl.data_collection_date -- only use most recent collected data per repo