maiera / gde-app

Apache License 2.0
22 stars 9 forks source link

Stack Overflow API Integration #211

Closed patt0 closed 9 years ago

patt0 commented 9 years ago

We need to add the SO user_id to the Accounts entity in the backend, and in the Accounts master spreadsheet to update the datastore. We should also identify the SO tags that are relevant for various product groups and teams.

This assumes that an aggregate monthly score is sufficient. Using the SO https://api.stackexchange.com/docs/top-user-answers-in-tags and the date range query operator, we should be able to automatically create a monthly (other interval?) record for each GDE.

The activity record will not have an associated G+ activity post, and could be entitled "Monthly SO Activity Record - April 2015"

Perhaps we can define the usage of the impacts for SO as follows:

What we now call Social Impact could hold the number of questions answered / accepted What we call Total Impact could hold the number of views on the questions, with an understanding that this will grow and that impact is a rolling window.

Its probable that as we move out of G+ into harvesting, we may need to refactor the way we name and collect the metrics we want, killing the one size fits all social metric associated with google+ posts.

SmokyBob commented 9 years ago

+1 on Everything.

I was thinking that we might want to add an "Option" page to enable GDEs to update the SO user_id and in the future other fields as more sources get harvested, this way we can avoid having to manage all this ids in the master_list and leave the GDE the ability to choose which plug-ins to use. What do you think?

P.S. Do you think it's a good idea to ask for feedback on how to integrate SO in the GDE community?

patt0 commented 9 years ago

Yes an options page will be of use and the field can provide a marker for processing data extraction.

We will request feedback when we are a little further down the line, our previous attempts have not been very fruitful, so I think we are better off doing this in consultation with the team at google who want to get some measurements.

patt0 commented 9 years ago

@SmokyBob @Scarygami

Finally got this going, changing the approach to getting answers for a period and then finding out the tags from the associated questions and created an activity record. Harvest Interval is weekly with a possibility to harvest retro actively from a particular date, while this is going to duplicate some tasks added manually its probably worth asking individual GDE's to delete those. ( This is done on Firefly to get an idea of what is looks like on the front end and in the raw data extraction )

https://github.com/patt0/gde-app/commit/647653e40d6e4b1016ebb3a41b02e15979e45d3e

Pushed the code to OMEGA in order to update the Product Group tags so harvesting can happen for those that have supplied their SO id. We can get the GDE to fill up our spreadsheet and push that number higher when we launch. Once the PG where updated, I copied the data from OMEGA 2 FIREFLY and ran the harvest from 1st January 2015.

Check out your harvest tasks https://10-dot-gdetracking.appspot.com/#/

Raw export test available (242 activity record creation from 75 gde's with SO id's) https://docs.google.com/spreadsheets/d/1p1goP2PKCjbd7XvqCKvDwGKpeeTfNON-SKILarc1oL0/edit#gid=729453407

SmokyBob commented 9 years ago

Everything looks good to me

LindaLawton commented 9 years ago

more tags

Google-api Google-Analytics-api Google-drive-sdk Google-docs-api
Google-visualization Google-oauth Google-api-dotnet-client google-api-php-client
google-calendar
google-drive-realtime-api
google-maps-api-3
google-maps-api-2 google-spreadsheet-api
gmail gmail-imap google-glass google-mirror-api youtube-api google-search google-addwords google-gdk
google-compute-engine google-apps-scripts

you could just do a search on Google in tags http://stackoverflow.com/tags?tab=name

LindaLawton commented 9 years ago

Looks like you are only checking from Jan 1, something up with it. Even looking only at the tags you appear to be grabbing. I have answered 3 Google+ (#googleplus), 3 app script (#googleappsscript) and an android (#android) question. They aren't listed. in your sheet. Nor are the older questions that get new +1's or accepts.

patt0 commented 9 years ago

@LindaLawton at this time we are only harvesting the questions for the ProductGroup of the particular GDE which explains why the routine may have not picked up these answers. We will need to see with Program Management how they want to deal with impact measurement with relation to GDE being polyvalent. In this harvest I started at Jan 1 2015 indeed, I need to check with Marie what she wants to do with previous periods data, when we may have a duplicate entry issue.

I will be launching the feature over the weekend with a FAQ and a Survey and will solicit feedback. That is a good point to raise.

LindaLawton commented 9 years ago

So I am locked into a single product group being Google-Analytics? At the very least please add Google-Analytics-api I don't even think 5% of my answers are analytic's related. Guess I am not a very productive GDE .

patt0 commented 9 years ago

I'll take it up for discussion with Ola and Gang in my next meeting. On 15 Apr 2015 23:40, "Linda Lawton" notifications@github.com wrote:

So I am locked into a single product group being Google-Analytics? At the very least please add Google-Analytics-api I don't even think 5% of my answers are analytic's related. Guess I am not a very productive GDE .

— Reply to this email directly or view it on GitHub https://github.com/maiera/gde-app/issues/211#issuecomment-93508817.

LindaLawton commented 9 years ago

I don't think it really matters. This is just for Google to track right. It does make sense that the Analytics team would only be interested in what I do analytics related. Anything I do in the other tags probably isn't valid info for them.

That being said looks good :)

Scarygami commented 9 years ago

One thing I noticed looking at my data on staging: For historic data it would be good to have the post_date set to a date (first or last) of the month the activity happened instead of the date the job has run. At the moment my April looks like I have been really active :)

patt0 commented 9 years ago

Yes that makes sense, I will make the record take the date of the last day of the period being harvested.

Patrick Martinent

On 20 April 2015 at 14:04, Gerwin Sturm notifications@github.com wrote:

One thing I noticed looking at my data on staging: For historic data it would be good to have the post_date set to a date (first or last) of the month the activity happened instead of the date the job has run. At the moment my April looks like I have been really active :)

— Reply to this email directly or view it on GitHub https://github.com/maiera/gde-app/issues/211#issuecomment-94392858.

LindaLawton commented 9 years ago

Are you recording all the tags or just the first tag? I seam to be very Android active for a non android person.

What happens with a question tagged #android #goggle-Analytics ? What happens if its the other way around?

Scarygami commented 9 years ago

Best to wait for @patt0 to answer, but looking at his source it will be counted for both product groups. So if you have a question like in your example you will have one SO activity in #android and one in #google-analytics no matter in what order they appear in the question.

Scarygami commented 9 years ago

@patt0 could you create a PR for your pending changes (even if you still have some additional changes planned before merging), just so it's easier to find the way there :)

patt0 commented 9 years ago

Yeah its a little strange, running some test, as you said it should create for both identified tags in their respective product group.

I did a sanity check against this and got 11 answers for the period Jan 2014 March 2015

http://stackoverflow.com/search?q=user:1841839+[android]

So it got the Android tags OK but it did not get Analytics in some case.

Been running a test on a month for Linda only and it does seem to work ... I am cleaning up the database and running it again ... then we can have a look.

I will also do a push of my fork and a PR if I don't find anything in the next 30 minutes.

Thanks both.

Patrick Martinent

Patrick Martinent

On 20 April 2015 at 19:30, Gerwin Sturm notifications@github.com wrote:

@patt0 https://github.com/patt0 could you create a PR for your pending changes (even if you still have some additional changes planned before merging), just so it's easier to find the way there :)

— Reply to this email directly or view it on GitHub https://github.com/maiera/gde-app/issues/211#issuecomment-94458903.

patt0 commented 9 years ago

Found the issue, was a classic case of Eventual Consistency as I moved to multi product groups, but using the same url for the link across some AR. In a close loop the query might not find the record and a second would be, but after a while, the indexes might be flushed and the query would find a record and think it was existing, while it might be for another PG. So I changed the query to use the title which contains the date interval and the product group.

I have run the harvest from 2014 Jan on stage for Gerwin and Linda so we can do some sanity checks, looks pretty good to me now.

https://docs.google.com/spreadsheets/d/1p1goP2PKCjbd7XvqCKvDwGKpeeTfNON-SKILarc1oL0/edit#gid=729453407

Patrick Martinent

On 20 April 2015 at 19:36, Patrick Martinent patrick.martinent@gmail.com wrote:

Yeah its a little strange, running some test, as you said it should create for both identified tags in their respective product group.

I did a sanity check against this and got 11 answers for the period Jan 2014 March 2015

http://stackoverflow.com/search?q=user:1841839+[android]

So it got the Android tags OK but it did not get Analytics in some case.

Been running a test on a month for Linda only and it does seem to work ... I am cleaning up the database and running it again ... then we can have a look.

I will also do a push of my fork and a PR if I don't find anything in the next 30 minutes.

Thanks both.

Patrick Martinent

Patrick Martinent

On 20 April 2015 at 19:30, Gerwin Sturm notifications@github.com wrote:

@patt0 https://github.com/patt0 could you create a PR for your pending changes (even if you still have some additional changes planned before merging), just so it's easier to find the way there :)

— Reply to this email directly or view it on GitHub https://github.com/maiera/gde-app/issues/211#issuecomment-94458903.