newslynx / opportunities

A garden of NewsLynx futures
1 stars 0 forks source link

GA recipes exceed timeout #173

Closed mhkeller closed 8 years ago

mhkeller commented 8 years ago

Given how slow GA can be, perhaps we increase the timeout for this or there's something else going on. What do you think @abelsonlive

Traceback (most recent call last):
  File \"/usr/local/lib/python2.7/dist-packages/newslynx/merlynne.py\", line 123, in run
    data = sous_chef.cook()
  File \"/opt/newslynx/newslynx-core/newslynx/sc/__init__.py\", line 126, in cook
    raise SousChefExecError(format_exc())
    SousChefExecError: Traceback (most recent call last):
  File \"/opt/newslynx/newslynx-core/newslynx/sc/__init__.py\", line 122, in cook
    data = self.load(data)  # load the data.
  File \"/root/.newslynx/sous-chefs/newslynx-sc-google-analytics/newslynx_sc_google_analytics/__init__.py\", line 509, in load
    d = list(data)
  File \"/opt/newslynx/newslynx-core/newslynx/sc/__init__.py\", line 80, in serialize
    for item in data:
  File \"/root/.newslynx/sous-chefs/newslynx-sc-google-analytics/newslynx_sc_google_analytics/__init__.py\", line 159, in run
    for row in self.format(data, prof):
  File \"/root/.newslynx/sous-chefs/newslynx-sc-google-analytics/newslynx_sc_google_analytics/__init__.py\", line 491, in format
    for row in self.pre_format(data, prof):
  File \"/root/.newslynx/sous-chefs/newslynx-sc-google-analytics/newslynx_sc_google_analytics/__init__.py\", line 481, in pre_format
    for row in data:
  File \"/root/.newslynx/sous-chefs/newslynx-sc-google-analytics/newslynx_sc_google_analytics/__init__.py\", line 454, in fetch
    r = q.execute()
  File \"/usr/local/lib/python2.7/dist-packages/googleanalytics/query.py\", line 518, in execute
    raise err
    JobTimeoutException: Job exceeded maximum timeout value (1800 seconds)
mhkeller commented 8 years ago

This particular recipe is grabbing data from 70 articles btw.

abelsonlive commented 8 years ago

it could have just been an aberration too. no harm in increasing the timeout.

mhkeller commented 8 years ago

Update: I increased the timeout to six hours: https://github.com/newslynx/newslynx-sc-google-analytics/commit/84ea79ad2178f30eea11cc6b6677d30ffa8a617d

It's pulling in the summary metrics appropriately but the timeseries data is coming back as 0 for both GA and share counts. See https://github.com/newslynx/issue-tracker/issues/536 for more info

abelsonlive commented 8 years ago

From the hidden issue, it suggests that share counts are not coming back as zero. it looks like you're probably pulling that data using sparse=true. the counts will be zero for periods where they weren't collected since the API will fill in these values when sparse=true.

mhkeller commented 8 years ago

Ah right, hmm let me see if this is a spotted tail issue then since this is the visual. I'll close this one since it seems increasing the timeout has fixed this issue.

screen shot 2015-10-18 at 9 48 36 pm
mhkeller commented 8 years ago

Fyi, resolved through https://github.com/newslynx/newslynx-app/commit/7b234a03361a44c4b0e6b8839efc6ca1f07edffb, which was a string that changed in 1.2.0 and I didn't catch all appearances.