mozilla / webcompat-team-okrs

These are quarterly team level OKR projects for the WebCompat Tools team
Mozilla Public License 2.0
11 stars 7 forks source link

Add a section with possible duplicates to the webcompat.com reporting form #248

Open ksy36 opened 2 years ago

ksy36 commented 2 years ago

We regularly receive duplicate reports and some of them have been reported a lot of times (for example imgur.com). To potentially discourage users from reporting duplicates and therefore save time on triage, we could add a "possible duplicates" section to the form.

We could place it after "Web address", "Issue", "Details", "Testing" sections and before the "Description".

There is a similar section in bugzilla:

Screen Shot 2022-01-05 at 3 38 17 PM

Do you think that could be useful? @softvision-oana-arbuzov @softvision-raul-bucata @karlcow

softvision-oana-arbuzov commented 2 years ago

I think that would be nice to have.

I would say the location to be after typing the URL in the "Web address" field. After the field is filled the suggestion list with duplicates/related issues could be shown, similar to Bugzilla.

karlcow commented 2 years ago

With the work of @denschub on having a backup of all data, It is probably easier to do than in the past.

What would be the criteria for evaluating the duplicate nature? ML? or something else?

URIs are an issue, because they do not work for things like facebook.com, google.com, etc. Discussed in

ML and backup DB are game changers in all these discussions. And also your experience in doing ML stuff. That's super cool.

softvision-raul-bucata commented 2 years ago

I agree with Oana, this would be cool and nice to have.

ksy36 commented 2 years ago

Thanks for your input everyone!

I would say the location to be after typing the URL in the "Web address" field. After the field is filled the suggestion list with duplicates/related issues could be shown, similar to Bugzilla.

This is a good idea! Also, we need to think about what to do with domains with a lot of issues. Perhaps we could take a 2-phase approach:

1) Once a reporter presses "Confirm" for the URL address we search for the issues with such domain and there could be two cases:

a) A few issues with such domain (for example nha.chotot.com):

Screen Shot 2022-01-12 at 2 35 07 PM

This subdomain has 5 issues open, 1 of them is fixed, 1 is open and 3 duplicates. Out of those duplicates, only one has a title changed on triage, so we can assume that this is the original issue for our purpose (AND it's closed as a duplicate of a bugzilla issue). So we'll show only 2 issues to a user as a "possible duplicate" (one of them is still open and other one is closed as duplicate). This assumption is probably not going to be valid in all cases :) But the changed title is pretty important I think, as it gives most of the context and means that the issue was in diagnosis at some point.

b) Domain with a lot of issues (for example imgur.com)

Screen Shot 2022-01-12 at 2 45 55 PM

This domain has 41 reports and it's not useful to show them all and also impossible to determine potential duplicates, just by having the domain name. In this case, we need more context from the user, type of the issue, steps to reproduce, etc. So if a search determined that there are more issues than a certain threshold (7-10 maybe?), then we show nothing to the user after they confirm their URL. Instead, we could build a model with bugbug and try to predict a duplicate after all content is entered. If a duplicate is found, maybe we can display it before or after the "screenshot" step.

This is a rough idea and it's quite likely I'm missing something :) I will look through the issues Karl posted to get more context as it appears significant research and work have been put into it.