opensource-observer / oso

Measuring the impact of open source software
https://opensource.observer
Apache License 2.0
71 stars 16 forks source link

“Libin diagram” Contribution Flows #837

Open ryscheng-mobile opened 9 months ago

ryscheng-mobile commented 9 months ago

What is it?

The ability for a project’s ecosystem to understand, in detail, how new users are entering and exiting their open source community / user dependency graph.

ryscheng-mobile commented 9 months ago

Screenshot 2024-02-09 10 20 48 Screenshot 2024-02-09 10 20 35 Screenshot 2024-02-09 10 20 21

ryscheng-mobile commented 9 months ago

Suggested steps: Suggest:

ryscheng commented 2 months ago

Question for @ravenac95

@ccerv1 and I were just talking about this one, and I think we need some help with the metrics rolling window factory to support it. I think there are actually 3 rolling windows at play here:

  1. The classification rolling window (e.g. a developer needs to have events in 10 of 30 days to be considered fulltime)
  2. The counting rolling window (e.g. we want to know how many active developers there were in the last 6 months)
  3. The comparison rolling window (e.g. across the last 2x 6-month periods --- how many users went from part-time to full-time, or part-time to churned, etc)

I think right now we only assume a single rolling window, is that correct?

ravenac95 commented 2 months ago

ohhhh ya interesting, we do currently assume 1, but ya I'll need to think how we can combine things so we can depend on some of these other rolling windows. This seems to be rolling window queries on rolling windows.

ravenac95 commented 2 months ago

This changes how I was thinking of things because I was trying to constrain the collection/project automatic creation a bit. Let me think on this!

ravenac95 commented 2 months ago

Actually so what i was thinking in terms of changes was to do something like this:

timeseries_metrics(
    model_prefix="timeseries",
    metric_queries={
        # This will automatically generate star counts for the given roll up periods. 
        # A rollup is just a simple addition of the aggregation. So basically we 
        # calculate the daily rollup every day by getting the count of the day. 
        # Then the weekly every week by getting the count of the week and
        # monthly by getting the count of the month. 
        # Additionally this will also create this along the dimensions (entity_types) of 
        # project/collection so the resulting models will be named as follows
        # `metrics.timeseries_stars_to_{entity_type}_{rollup}`
        "stars": MetricQueryDef(
            ref="stars.sql",
            rollups=["daily", "weekly", "monthly"],
            entity_types=["artifact", "project", "collection"], # This is the default value
        ),
        # This defines something with a rolling option that allows you to look back 
        # to some arbitrary window. So you specify the window and specify the unit. 
        # The unit and the window are used to pass in variables to the query. So it's 
        # up to the query to actually query the correct window. 
        # The resultant models are named as such
        # `metrics.timeseries_active_days_to_{entity_type}_over_{window}_{unit}`
        "active_days": MetricQueryDef(
            ref="active_days.sql",
            rolling={
                "windows": [30, 60, 90],
                "unit": "day",
                "cron": "0 0 1 */6 *", # This determines how often this is calculated
            }
        ), 
    },
    default_dialect="clickhouse",
)

I think this setup should give us the flexibility to be able to do the window of windows without having to build much additional craziness i think?