mozilla / participation-metrics-org

Participation metrics planning repository
4 stars 4 forks source link

Data Extracted from Reps API give a wrong information #75

Closed mermi closed 7 years ago

mermi commented 7 years ago

Recording to this issue #74 , @MichaelKohler was noticed on the chart about Rust I mentioned [https://analytics.mozilla.community:443/goto/67a0100c14f618d912ef92c8dd477dfa ] shows that Daniele Scasciafratte made one activity about Rust in the last 6 months, I have another chart showing the Evolution of Mozilla activities per Date [https://analytics.mozilla.community:443/goto/266e251164ce5240b0fe5ee30f2070df] it shows on 25th August Daniele organized a Dive Into Rust Event but when we looked to his reports on Reps Portal [https://reps.mozilla.org/reports/rep/Mte90/?sort_key=report_date_desc&query=&page=4] in that date Daniele didn't made any activity about Rust or attended one.

So now we are afraid that the data extracted from the Reps API are wrong or something wrong happening during the extraction.

MichaelKohler commented 7 years ago

@mermi you linked the wrong visualization, it should be this one: https://analytics.mozilla.community/edit/app/kibana#/visualize/edit/Categorical-of-User-who-organized-a-rust-Event?_g=(filters:!(),refreshInterval:(display:Off,pause:!f,value:0),time:(from:now-6M,mode:quick,to:now))&_a=(filters:!(),linked:!f,query:(query_string:(analyze_wildcard:!t,query:%27*%27)),uiState:(),vis:(aggs:!((id:%273%27,params:(field:user,order:desc,orderBy:%272%27,size:50),schema:segment,type:terms),(id:%272%27,params:(field:activity),schema:metric,type:cardinality),(id:%274%27,params:(filters:!((input:(query:(query_string:(analyze_wildcard:!t,query:%27Dive%20Into%20Rust%27))),label:%27%27))),schema:group,type:filters),(id:%275%27,params:(filters:!((input:(query:(query_string:(analyze_wildcard:!t,query:%27Organized%20an%20Event%27))),label:%27%27)),row:!f),schema:split,type:filters)),listeners:(),params:(addLegend:!t,addTimeMarker:!f,addTooltip:!t,defaultYExtents:!f,mode:stacked,scale:linear,setYExtents:!f,shareYAxis:!t,times:!(),yAxis:()),title:%27Categorical%20of%20User%20who%20organized%20a%20rust%20Event%27,type:histogram))

Reports from Daniele on Aug 25th: https://reps.mozilla.org/api/remo/v1/activities/40164/ https://reps.mozilla.org/api/remo/v1/activities/40110/ https://reps.mozilla.org/api/remo/v1/activities/40109/

None of these have anything to do with "Dive Into Rust".

What we need to know:

MichaelKohler commented 7 years ago

Could it be that the filter is a "contains any of the following strings" filter? Then Daniele's "Rust planning" would hit that requirement, since it contains "Rust". If so, how can we search for a string and a full string only?

canasdiaz commented 7 years ago

Thanks for your feedback @MichaelKohler, we are having a look. We'll update you tomorrow.

jgbarah commented 7 years ago

Side comment: when you need to post links to the Kibana-based dashboard, you can use the shortener. Just click on the second icon, starting from the right, when you unfold the sharing panel. That will shorten the link before actually sharing, making it much more easy to handle. For example, the link above by @MichaelKohler would be:

https://analytics.mozilla.community:443/goto/7ae4ce73f4195946951b98d6ac52a269

jgbarah commented 7 years ago

I'm going to address this in several comments. To start with, let's see how we can learn which entries in the index correspond to the entry for Daniele Scasciafratte that you find in your visualization.

Let's start with your visualization, and let's select the bar corresponding to Daniele Scasciafratte just by clicking on it. When clicking on the bar, you'll notice three grey filters being proposed below the top menu. You just accept all three, they become green, and you´ll get this visualization (a single bar with a value of "1"), see below:

selection_102

In it, I made the filters sticky as well, by clicking, for each of them, in the pin. Once they are sticky, they will come with me when I move to other Kibana menus.

Now, I move to the "Discover" menu (select "Discover" in the top menu). Once there, select the "remo2-activites" index (in the top left, you have the currently selected index: you can click on it to unfold the list of available indexes, and selec "remo2-activities").

Now, I get this search, with only one result for Sept. 10, which I can unfold (click on the little triangle on the left of the entry) to see all its details:

selection_103

You can also click on the blue link appearing on the right, just below the chart and the summary of the item, starting with "Link to /remo2-activites". That points to the raw data in the index.

If you look either at the details for that item in the Discover panel, or in the raw item in the index, you'll notice maybe with surprise that "Rust" is present, but neither "Dive" nor "into" is in any of the fields. What happened? Well, here is where the syntax of Elasticsearch queries kicked in. In short, we got what we asked for. More details in the next comment.

jgbarah commented 7 years ago

OK, so we were at the search panel with the filters for "Organized an event", "Danielle Scasciafratte" and "Dive into Rust" as sticky filters. Why the only item shown does not have "Dive into Rust"? The short answer is "because in the query we're really asking for "Dive" OR "into" or "Rust". I'm not completely sure, but I guess the trouble is in the quotes. In your queries, you're not using them (both in "Dive into Rust" and "Organized an Event":

selection_105

You can see that both queries are basically the same as using OR, if you substitute your current filters in the buckets for "Dive OR into OR Rust" and "Organized OR an OR Event" (without the quotes). The bars on the chart are the same.

But if you use quotes, the chart is rather different.

selection_106

Unfortunately, I see that when you use the quotes, what seems to bee a bug in Kibana is triggered, and when you hover the bar that you have now in the chart, you get

selection_107

There are some other troubles, because the field "activity_description", which is not analyzed (that is, searches in it are literal), is also present in the index as "analyzed_description", which is analyzed (that is, the field is converted to a list of words, and queries are applied to it). This is something we use for other reasons, and we could change if needed.

jgbarah commented 7 years ago

Now, let me show you how this can be done in other way. Let's assume you want to know the list of people organizing activities related to Rust. For that, we can create a table, and bucketize it with a filter ("rust"), terms (author_name), and terms (activity). Once you have the table, you click on the table on the kind of activity you're interested in: "Organized an event". As a result, you have this table with the list of people organizing events, and the number of events they organized.

selection_108

I've checked some of the cases, and it seems the results are those that should be. But if you give it a try, please let me know your impressions.

jgbarah commented 7 years ago

For doing it with a bar chart, I've designed this visualization, which shows persons doing certain activities mentioning Rust. You can notice how Rust is specified with a filter (by editing the filter you can use some other word instead of Rust), and how for each person all the activities in which participated with different roles are shown.

selection_109

If you want to have only those as organizers, you can just click on "Organized as event" in the legend, to filter those activities, getting this visualization (notice the two green filters in the top).

selection_110

This visualization could be included in a dashboard, and by using the same filters, all the dashboard, including the visualization, will show only "Rust" and "organized". But the filters could be deactivated if that's convenient.

Well, welcome to the world of querying with Kibana and ElasticSearch. My impression is that it is very powerful, but very complex as well. In the case of this kind of queries, I hope they will be more simple when we have categories implemented (see #66)

MichaelKohler commented 7 years ago

Wow, this is an amazing explanation! Thanks a lot!

Some quick notes:

@mermi I've played around with it and it looks good to me. However we need to take care of the fact that we always have "Organized an Event" as activity and "MozActivate" as initiative for events. There is no way we can filter those more granular, since the more granular ones "MozActivate - XXX" are activities, not initiatives. So for non-autogenerated reports, we can use these very well, but not for the event generated ones. This is not a problem with Kibana, but how we organize the data on the Reps Portal side. This is also why "Dive into Rust" (with the quotes) does not yield a lot of entries since not a lot of people create manual reports with that activity.

Edit: of course #66 as poined out by Jesus above would help here

MichaelKohler commented 7 years ago

As this is not a bug, I think we can close this issue. We would open up a new one if we should find anything else.

hmitsch commented 7 years ago

Closing issue, as mentioned by @MichaelKohler.