Include relevant non-Fenix bugs to the training set of the Fenix model

mozilla / bugbug

Platform for Machine Learning projects on Software Engineering

Mozilla Public License 2.0

504 stars 311 forks source link

Include relevant non-Fenix bugs to the training set of the Fenix model #4354

Open suhaibmujahid opened 3 months ago

suhaibmujahid commented 3 months ago

Suggested by @marco-c:

For the Fenix model, we should try adding GeckoView too as a potential target some of the Fenix bugs end up in GeckoView

We could make this dynamic instead of hard coding it. For example, when we create the training dataset, we could include bugs that were in Fenix and then got moved to other products (if the component passes a certain threshold).

marco-c commented 3 months ago

For GeckoView in particular, we should probably just hardcode it (as GeckoView could be considered in the end a subproduct of Fenix). Filed https://github.com/mozilla/bugbug/issues/4355 for this.

marco-c commented 3 months ago

We should definitely include as "negative" examples in the Fenix component model training set, bugs that were put in Fenix by our bot but moved out of Fenix.

benjaminmah commented 3 months ago

We should definitely include as "negative" examples in the Fenix component model training set, bugs that were put in Fenix by our bot but moved out of Fenix.

Would this only include the bugs that were moved to Fenix with the newer version of BugBot (with the Fenix model, ~1 month ago) or with any version of BugBot?

benjaminmah commented 3 months ago

After some investigation, there are two possible cases that we can consider:

originally a non-Fenix product -> Fenix::<component> by BugBot -> a non-Fenix product by a Bugzilla user
originally Fenix::General -> Fenix::<component> by BugBot -> Fenix::General by a Bugzilla user

The first case is usually caused by the general component model, which classifies bugs as Fenix::General incorrectly (something that the Fenix component model cannot do, as it is trained on components other than General).

The second case is caused by the Fenix component model, which moves any bug classified as Fenix::General to a component with a high enough confidence.

It would make sense to focus on the second case, as these misclassifications are all caused by the Fenix component model, while the first case is mostly (if not all) caused by the general component model misclassifying a bug as Fenix::General.

WDYT? @marco-c

marco-c commented 3 months ago

I agree the first case is not so interesting for the Fenix model, only for the general component model (and those bugs should already be included in its training set).

There could also be the case:

originally a non-Fenix product -> Fenix:: by BugBot -> Fenix::General by a Bugzilla user which we should consider for the Fenix model.