thegooddata / webapp

TheGoodData web application
http://www.thegooddata.org
GNU General Public License v3.0
3 stars 2 forks source link

Count the "pieces of data contributed" in non English queries as well #106

Closed marcosmenendez closed 9 years ago

marcosmenendez commented 9 years ago

Up to now we only counted the queries done in English, this may lead to think that the service is not working and uninstall the extension.

We should therefore count as well queries done in other languages if they don't contain sensitive terms. For the time being, lets use the same dictionary of sensitive terms.

Do not share any of those queries with Chango. Only those that are done in English based on search url (hl variable) or browser language.

I would also review the calculation of this number. I know a user with the extension saying that he has contributed with 1900 pieces this month, while Total queries traded have been 8600 according to good-data. One of them is wrong

atrandafir commented 9 years ago

This issue is more related to the extension than to the webapp, only the last paragraph in the description relates to the webapp.

@bodiapz let me try to explain to so you we better understand this task so you can work on it.

This is what is happening now in the Extension:

  1. A user searches for something in Google
  2. The extension detects the search and checks if the language of the user is supported. So right now we only track queries for English since in our database it is the only one supported (because we have a list of "bad words" for that language so we can filter sensitive queries).
  3. If the language is supported and there were no bad words:
  4. It will load a script from Chango
  5. It will send the search query to our API to store it in the database

This is the link to the function in the extension code that does the steps above: https://github.com/thegooddata/extension/blob/master/chrome/scripts/background.js#L504

So basically what @marcosmenendez is talking about is to change this and adapt it to:
No matter what language the users has, continue to check bad words (in that lang if available, if not, in english) and finally if there were no bad words, only load Chango script if the language is supported.

But I guess based on this requirements maybe we could also add new fields to the languages table and use them to make this work. The fields would be something like "trade" and "has_badwords" and then mark all the languages as supported. So then if the language has has_badwords=1 it will load the list of bad words of that language, if not, english bad words, and if it has trade=1 it will load Chango script.

Ofcourse maybe some extra modifications will be required to be able to check if a language has bad words and to load them, and if not, load by default the english ones.

As for the last paragraph related to the webapp, you should check on "Your data page" where it says "You have contributed 0 pieces of data this month." to see if that is properly calculated. I guess it is. Maybe the "bug", if there is one, could be on the extension side, I mean maybe there are repeated queries sent to the database, or something like that.

As for working with the extension locally and debugging it:

  1. Go to Chrome Extensions tab
  2. First disable the original TGD extension if you have it installed (by checkbox)
  3. Select “Load unpacked extension”
  4. Choose tgd-extension\chrome directory

Now you are working with the local extension but with the production webapp API.

In order to work with your own local webapp's API, in the Extensions tab next to the The Good Data extension you'll have a "Configuration" link. If you click it you'll see there's one page with one option, "Environment", you can choose there "dev2", save, and disable/enable extension, that will switch it to work with www.tgd.local API.

You can see in the extension's config file what it does for every development environment: https://github.com/thegooddata/extension/blob/master/chrome/scripts/config.js

You can see what are the debug options, that by default are all turned off, but in dev2 there are some enabled, you can enable more in dev2, or you could create a new environment just for you and set specific settings for it.

Finally, to see the debug output, again in the Extensions tab you have something like "Inspect views" with a link next to it that will open a popup similar to Chrome debug toolbar where you can see network requests, javascript console logs/errors, and so on.

Just to keep in mind: Beware that some data we load from the API is cached locally to Chrome's localstorage, such as languages support, or bad words list, so in case you change something in the API related to languages, it will only load it once every 1h or so. You can play with the jsCache tool by either setting a temporary shorter expiration time or by using it to delete a item from the cache by its key to force loading new data.

marcosmenendez commented 9 years ago

Hi @bodiapz could you explain if there was or not an error in the calculation of total and individual pieces of data contributed as explained in the last paragraph of the issue?

Some reasons for that disfunction would be:

  1. That in the individual number presented in the extension we are summing up all queries done and not just las month's
  2. That we are only counting queries of those that have given permission to store their data. We should count non sensitive queries of everyone
bodiapz commented 9 years ago

Hi, it was summing all queries not just last month. I have changed summing for just last month. But I have question: it should summing up all queries for just current month or for last month (I mean current date minus 1 month)?

marcosmenendez commented 9 years ago

It should sum las 30 days of non sensitive queries regardless of your language and wether we are storing them or not

bodiapz commented 9 years ago

Now it summing queries from table tbl_queries (for last 30 days). I modified summing for all languages and committed. What do you mean "storing them or not" ?

marcosmenendez commented 9 years ago

Queries can be stored or not based on user preferences set up in the extension (have a look at the extension to see that button)

Please confirm as well that queries counted are the ones that contain non sensitive terms

bodiapz commented 9 years ago

So we should summing all queries by user for every day and store this count (not depends of user preferences). That way we can calculate count of queries for last 30 days.

marcosmenendez commented 9 years ago

No idea about how to implement. The one that is less costly in terms of resources. Does not need to be updated on the fly, daily is ok

atrandafir commented 9 years ago

Hey, just to clarify things out. Storing them or not based on user preference is handled on the extension side. So it has nothing to do with counting in the webapp because in the webapp we only have those that user allowed storing.

Also yes, queries counted are those without sensitive terms, but again, that's because the extension won't send to the server a query containing sensitive terms so we have none of them in the database.

Feedback for @bodiapz on this issue:

The modification on the extension was correct but the missing part was to "check query against blacklist in user's language and if there was no blacklist for that language to load it from english".

With the current implementation I found it easiest to solve by doing this change in the webapp API action that checks query against the blacklist as you can see it in my previous commit.

marcosmenendez commented 9 years ago

@atrandafir does it mean that we are only counting those that we store? We should be counting all (the not sensitive ones) that we detect in the extension

atrandafir commented 9 years ago

That's how it works now with the latest changes. We store all the non-sensitive queries.

marcosmenendez commented 9 years ago

@atrandafir I'm not asking about which queries we store, but about which ones we count. The second number should not depend on user's decision to store or not the queries. Let's talk about this offline

atrandafir commented 9 years ago

OK, closing this for now and we shall open new issue for anything related.

agonbar commented 9 years ago

image My username is adrian.gonzalez.barbosa@gmail.com and it has never shown any more than a 0. The browser is in spanish, and I use google to search for data almost everyday.

marcosmenendez commented 9 years ago

@atrandafir can you take a look at it? It seems that it is still not counting the queries done in Spanish as requested

atrandafir commented 9 years ago

As far as I know I'm not sure this version of the extension has been deployed to Chrome store so I will first check that.

atrandafir commented 9 years ago

Indeed the changes in the extension were not released yet to Google Chrome and that's why queries in Non-English are not processed now.