mozilla / bugbug

Platform for Machine Learning projects on Software Engineering
Mozilla Public License 2.0
504 stars 311 forks source link

MEI: Have a visualization of the time series of maintenance effectiveness values, aggregated by different teams/components #3717

Open jensstutte opened 1 year ago

jensstutte commented 1 year ago

Once we have a time series, we could construct a nice graph visualizing the trend for some of of those values.

It should be possible to aggregate/filter as usually in bugbug UI.

Updated requirements for the graph:

For a given team (or other component selection) like "DOM LWS" and a given time period graphduration (say a year as example) we want to:

for week in graphduration
    for delta in [week -1w, week - 1m, week - 3m]
        calculate ME, BDTime, WBDTime, Incoming, Closed with team, week, delta

Note that we want to calculate all those values ad-hoc on the latest data we have in bugzilla, even if they seem to be historical (bugs might move between components, change severity, etc).

This gives us a series of 3 value quintuples for each week over the graphduration. We can directly plot (with different colors for week, month and 3 months delta) the values for ME, BDTime and WBDTime.

For the Incoming and Closed values we should chose only one delta and scale the value up to 12 months. I'd probably try to do so for the 1 month values (not only for the ease of calculation but also for the medium dynamic I expect). This yields for each week the percentage of bugs that would have been incoming/closed for an entire year (if done at at the same rate) compared to our defect backlog. It gives an immediate feeling of "how big is my technical debt backlog" in relation to active work and incoming bugs.

gothwalritu commented 1 year ago

@jensstutte and @suhaibmujahid. I am new to this and have started looking into this issue. So, far I have figured out that the component model training outputs a component model (pickle file?). Is there a documentation on how to read this file? Or if this is not the right direction than which file, do you recommend would have all the necessary features (creation date, completion date, bug ID and associated components) for creating this time series of maintenance effectiveness values?

marco-c commented 1 year ago

@gothwalritu this issue is pretty complex, I'd suggest picking something else.

gothwalritu commented 1 year ago

@marco-c, I have some experience with time series analysis, and I would like to give it a shot.

marco-c commented 1 year ago

@gothwalritu unfortunately there are quite a few other things to do, which require context, in order to retrieve the data to analyze, and then the actual work for this issue can start.

gothwalritu commented 1 year ago

@marco-c, @suhaibmujahid I have been trying to understand the problem and the associated data. In the "element chat room" @jpangas mentioned a few links to go through first to develop an understanding of this task. I did the same and so far, I could understand that there are different products, component and teams.

            component_name team_name            product_name

0 Fennec Mozilla Untriaged Bugs 1 Firefox Mozilla Untriaged Bugs 2 General Mozilla Untriaged Bugs 3 Compiler Mozilla Rhino Graveyard 4 Core Mozilla Rhino Graveyard ... ... ... ... 2308 FlightDeck Mozilla Mozilla Labs Graveyard 2309 Personas Plus Mozilla Mozilla Labs Graveyard 2310 Test Pilot Mozilla Mozilla Labs Graveyard 2311 Test Pilot Data Requests Mozilla Mozilla Labs Graveyard 2312 Test Pilot Studies Mozilla Mozilla Labs Graveyard

The bugs are categorized in them.

The objective is to create a visualization of the time series of maintenance effectiveness indicator (MEI) values. So, first I will try to work with a one team, maybe "mozilla". The function to calculate MEI is defined in "bugzilla.py", I can use it to calculate the MEI.

Import the function calculate_maintenance_effectiveness_indicator and any other necessary libraries or modules

Define the team

team = 'Mozilla'

Define the date range

start_year = 2000 end_year = 2022

Create an empty dictionary to hold the MEI values for each year

mei_values = {}

I tried plotting the graph as:

Add title and labels

plt.title('Maintenance Effectiveness Indicator (MEI) Over Years for team: mozilla') plt.xlabel('Year') plt.ylabel('MEI Value')

image

And I am planning to do the time series analysis maybe using ARIMA for the same team. Let me know if I am going in the right direction. Otherwise, I will drop this task and move onto another one.

jensstutte commented 1 year ago

This looks promising. I will have to update a bit the requirements such that it is more clear what actually we want to see here based on which calculations (cadence and time delta) and how to put that together, please expect that to happen by Friday. Note that ultimately we will want to integrate this into bugbug's UI, but for now we are happy to have a standalone PoC. Note also that it is probably more representative for testing to use a different team than "Mozilla", you might want to try "DOM LWS", for example.

gothwalritu commented 1 year ago

Thank you so much! I will try with DOM LWS.

jensstutte commented 1 year ago

So for a given team (or other component selection) like "DOM LWS" and a given time period graphduration (say a year as example) we want to:

for week in graphduration
    for delta in [week -1w, week - 1m, week - 3m]
        calculate ME, BDTime, WBDTime, Incoming, Closed with team, week, delta

Note that we want to calculate all those values ad-hoc on the latest data we have in bugzilla, even if they seem to be historical (bugs might move between components, change severity, etc).

This gives us a series of 3 value quintuples for each week over the graphduration. We can directly plot (with different colors for week, month and 3 months delta) the values for ME, BDTime and WBDTime.

For the Incoming and Closed values we should chose only one delta and scale the value up to 12 months. I'd probably try to do so for the 1 month values (not only for the ease of calculation but also for the medium dynamic I expect). This yields for each week the percentage of bugs that would have been incoming/closed for an entire year (if done at at the same rate) compared to our defect backlog. It gives an immediate feeling of "how big is my technical debt backlog" in relation to active work and incoming bugs.

jensstutte commented 1 year ago

Infinite values for BDTime/WBDTime can appear and may just be ignored for now, but sooner or later we'll need a good way to show them in order to make clear "here is a problem".

gothwalritu commented 1 year ago

@jensstutte, thank you for the explanation. It really helped me a lot to dive deeper into this. Here are the steps I will be following:

  1. Instead of looping over years, I will loop over weeks within the specified year range. For each week, I will calculate the MEI for the 1-week, 1-month, and 3-month deltas.

  2. For each week, I will store the MEI, BDTime, WBDTime, Incoming, and Closed values for the three different deltas.

  3. I will create separate plots for each metric (ME, BDTime, WBDTime) with different colors or markers for each delta.

  4. For Incoming and Closed values, I should choose one delta and scale the values up to a 12-month period.

Let me know if that is good enough for the start.

jensstutte commented 1 year ago

Let me know if that is good enough for the start.

That sounds right. We can probably have BDTime and WBDTime on the same chart from the beginning, as they have the same scale and should have a comparable order of magnitude (and if not we would want to directly know). I think in the end I'd like to combine the burn down chart with the Incoming/Closed data as second vertical scale as supporting information. But let's first see how dense the information gets on single charts.

gothwalritu commented 1 year ago

Ok. So, far I am able to produce these results with: Team: 'DOM LWS' Start year = 2021 end year = 2022

week delta MEI BDTime WBDTime Incoming Closed
2021-01-01 7 371.428571 4.276712 2.022783 0.446429 0.892857
2021-01-01 30 160.576923 8.331258 2.614481 4.618117 5.595027
2021-01-01 90 144.632768 9.480397 3.127449 13.811189 16.346154
2021-01-08 7 144.117647 3.563927 2.562192 1.248885 1.784121
2021-01-08 30 154.629630 6.545988 2.791734 4.340124 5.580159
2021-01-08 90 140.909091 10.182648 3.431507 13.835377 16.199650

I plotted the graph for MEI value

image

I am working on the burn down chart now will update you as soon as I am finished with it. In the meantime, I would greatly appreciate your feedback. Thanks so much!

gothwalritu commented 12 months ago

Here is the graph for BDTime and WBDTime on the same chart:

image

gothwalritu commented 12 months ago

This chart showing the progression of the incoming and closed values:

image

gothwalritu commented 12 months ago

And here is the combined chart with two vertical axis: image

There is too much info in one chart.

jensstutte commented 11 months ago

Sorry for not coming back earlier here (actually I was convinced I had answered already time ago but I do not see the answer, so maybe I did not hit the send button or such?)!

This looks all very promising. I think in general we do not need to go back such a long period in time, 12 months should be enough. And we mostly want to look at the 3 months delta value in practice, so to keep things more readable we could remove the 1-month-delta curve everywhere. The weekly values then show the peaks and the 3-months-value shows the average we want to move to our target.

  1. MEI chart - I think we want to limit the scale to 500% maximum (and indicate somehow graphically if we overshoot). A horizontal 100% line emphasized more than the other grid lines would also be nice.
  2. (W)BD Time chart - Here we can fully concentrate on the 3-months-delta only. The weekly noise is not interesting, I'd say (we have that in the MEI). And again we should limit the scale to something like 15years (and indicate somehow graphically if we overshoot).
  3. Incoming and Closed - Here we want probably only the weekly value to spot unusual peaks. But we want to scale the values up to 52 weeks, such that the value tells us "in relation to our entire defects backlog - how many bugs do we touch in a year (as percentage)". Actually we might want to do this normalization already in the calculating script, maybe.
  4. Combined chart - yes, that looks overcrowded. It might become better with the above adjustments (removing some curves), maybe? We could try to have it for 3-months-delta values only.

Thank you!

gothwalritu commented 11 months ago

No problem at all! I appreciate your feedback and clarifications. It sounds like we're on the right track. I'll make the adjustments as per your suggestions: focusing on 12 months, removing the 1-month-delta curve, and emphasizing the 3-months-delta curve in the MEI and (W)BD Time charts. We'll also limit the scales and add graphical indicators as needed. I'll work on these refinements and will keep you updated on the progress. Just a heads up, I'll be on vacation for the next couple of weeks, so there might be a brief delay in my responses during that time. Thank you for your understanding!

gothwalritu commented 10 months ago

I modified the MEI chart accordingly and here is the result:
image

@jensstutte, @marco-c Let me know your feedback on this. I am working on the other two charts, will update you once I am ready. Thank you.

jensstutte commented 10 months ago

That looks very good, thank you! The only minor thing might be to not waste so much vertical space for values going below 0, which is not possible by definition. And to use a more recent time interval, like all 2023 / last 12 months. Looking forward to the other updates!

gothwalritu commented 10 months ago

@jensstutte, I am glad that I am able to produce these results with your guidance. In this chart I removed the vertical space below 0 in MEI and have used the 12 months of 2023. Here is the updated chart for MEI over weeks for team " DOM LWS". image

gothwalritu commented 10 months ago

@jensstutte, here is the chart for BDT and WBDT over 12 months of 2023 with 3-Month Delta for team DOM LWS with limited scale to 15years:

image

I am working on other charts as well. I will update the progress accordingly. Please let me know your comments on the previous charts. Thank you!

jensstutte commented 10 months ago

@jensstutte, I am glad that I am able to produce these results with your guidance. In this chart I removed the vertical space below 0 in MEI and have used the 12 months of 2023. Here is the updated chart for MEI over weeks for team " DOM LWS". !

I think that nails it, thanks!

jensstutte commented 10 months ago

@jensstutte, here is the chart for BDT and WBDT over 12 months of 2023 with 3-Month Delta for team DOM LWS with limited scale to 15years:

That looks fine, too. It is interesting to see how higher severity bugs can move the needle fast for WBDTime but less fast for BDTime. And one can clearly see the correlation of MEI going close to 100% and WBDTime spiking up, which is what I expected to see.

I am working on other charts as well. I will update the progress accordingly. Please let me know your comments on the previous charts. Thank you!

Thank you!

gothwalritu commented 10 months ago

@jensstutte, yes it's fascinating to observe how higher severity bugs impact WBDTime differently than BDTime. The correlation between MEI approaching 100% and the spike in WBDTime indicating a strong connection between maintenance effectiveness and the time to resolve bugs.

Here is the chart for Incoming and Closed. First calculated the scaled values for Incoming and Closed by dividing them by 52 (the number of weeks in a year) and then multiplying by 100 to represent them as percentages. These scaled values indicated how many bugs are being processed in a week relative to the entire defects backlog for the year.

Then, created a line plot for Scaled Incoming and Scaled Closed, adding labels, a title, a legend, and grid lines for better visualization. Here is the chart:

image Let me know your comments on it.

marco-c commented 10 months ago

@gothwalritu just wanted to say great work, sorry I initially tried to ask you to work on something else :)

gothwalritu commented 10 months ago

@marco-c, Thank you so much for your kind words! I believe we do everything with our best intentions.. so please don't apologize :)

gothwalritu commented 10 months ago

@jensstutte, I think there is some mistake I did in the last chart. Let me work on this and will get back to you ASAP.

gothwalritu commented 10 months ago

@jensstutte, here is the chart for Incoming and Closed for the weekly values. First calculated the scaled values for Incoming and Closed by dividing them by 52 (the number of weeks in a year) and then multiplying by 100 to represent them as percentages. These scaled values indicated how many bugs are being processed in a week relative to the entire defects backlog for the year.

Then, created a line plot for Scaled Incoming and Scaled Closed, adding labels, a title, a legend, and grid lines for better visualization. Here is the chart:image

The previous chart had the data points of 7 days and 90 days, that is why it was in the zigzag pattern. Kindly let me know if this chart is fine and I will create the Combined chart for 3-months-delta values. Thanks!

jensstutte commented 9 months ago

Thank you, that looks much better!

gothwalritu commented 9 months ago

@jensstutte, I appreciate your comments.

I plotted the Scaled incoming and scaled closed for 90 days/3months- delta values. Here is the graph: image

and the combined chart of scaled incoming/closed and BD Time/WBD Time over 12 months (2023) for 3-months-delta:

image

Let me know your feedback on this. Thank you so much!

gothwalritu commented 8 months ago

Hello @jensstutte, do I need to do anything else in this issue? Let me know.

jensstutte commented 8 months ago

Hi @gothwalritu ! In order to better judge how we can integrate your code in our systems, would you mind sharing it via a PR here ? The ultimate goal would be to add some of these diagrams to our bugbug UI, but for now a separate script will do.

Thank you!