uscensusbureau / granicus

1 stars 0 forks source link

Total Digital Impressions #31

Closed omerbensaadon closed 5 years ago

omerbensaadon commented 5 years ago

Full Requirements in comment below


Total Digital Impressions = Total Non-Unique Opens + Total Non-Unique Clicks

API Calls Used /api/v2/accounts/{account_id}/reports/bulletins This API Call returns an extremely detailed list Bulletin Detail reports (give it a shot). It takes a LONG time to return a result, even if the days parameter is "1"

The only issue is that the results are paginated, meaning if we have more than 20 bulletins for that time period, it will require 2 API calls.

Even worse, it doesn't look like there's a way to tell after the first call if there are any results remaining. The only way to access a new page is to set the page parameter to the page # you want, if it's left blank it defaults to 1.

Perhaps there's a way to take the results from the Bulletin Summary Information call and check to see if the bulletins_sent is greater than 20, then setting pages = bulletins_sent / 20 the plugging that into a for loop.

Data Needed We need to aggregate nonunique_opens_count + nonunique_clicks_count for each bulletin that is returned by this call.

TODO

omerbensaadon commented 5 years ago

Anthony wants metrics for all topics, having a handful of topics simply will not do as it does not reflect total digital impressions

omerbensaadon commented 5 years ago

Old requirement, updated requirement below


Total Digital Impressions = Total Non-Unique Opens + Total Non-Unique Clicks

total_digital_impressions

API Calls Used /api/v2/accounts/{account_id}/reports/bulletins

Data Needed:

Note: This API returns bulletin-specific metrics (i.e. metrics for each individual bulletin) in batches of 20 (batches are explored by changing the page field in the header).

Also, turns out I was wrong on the phone, if there are more results left to see (i.e. incrementing the page field in the API header does not return a null response) the response will have a next field, like so:

  "next": {
            "href": "/api/v2/accounts/10274/reports/bulletins?end_date=2017-02-01&page=2&start_date=2017-01-01"

We send over 100 bulletins over any given 3 week period. This API call may take a while to return.

This also requires the summation of nonunique_open_count and nonunique_clicks_count but feel free to create a new table and dump all of the data provided by the API Response in there. (I'd prefer that all of the rows were Bulletins and the columns were the data that belongs to each)

Alternatively, do the summing in JS and just add a row to the existing table called "Total Digital Impressions"

We could probably do more with data in the first case, but I'll settle for the data in the second if it's easier.

omerbensaadon commented 5 years ago

@loganpowell requirement written above, let me know if that's clear enough

loganpowell commented 5 years ago

I changed this to a large. This is more complex logic:

  1. For each time period (3) call the /bulletins API
  2. Recursive function that starts with each API call, counting the number of bulletins
  3. Base case: If the count of the bulletins within the bulletin_activity_details array < 20 OR If the result != undefined, return function's results
  4. If the count = 20, collect the bulletins into an array call the second page in the next link under the _links object (currently, even if there are no more bulletins, this link exists (Granicus has been asked to remove the link if there are no more results, but we're not waiting for that - requires a less efficient, naive approach)
  5. repeat from step 2.

Currently, to get the three time periods for a single-level of bulletins (i.e., without pagination) the response is taking just over a minute. I can only estimate that this logic will double that time (if your estimation "We send over 100 bulletins over any given 3 week period" is divided by 60 (number of first-level results for 3 time periods) = @ 2.

loganpowell commented 5 years ago

I'm getting closer, but there's a big problem that's beyond my immediate control here:

https://devcenter.heroku.com/articles/request-timeout

Just one of the bulletins calls can take up to over a minute (> 60000ms) to return a response. The problem is that heroku automatically terminates any connection after no response is returned after 30 seconds (30000), so if we want to do this, we have to explore other options, which we'll need to discuss. I will move onto other tasks as this is a hard-block currently.

omerbensaadon commented 5 years ago

Okay so I think we've got to do one of 2 things to make this work:

  1. Limit the number of topics we are asking for
  2. Limit the number of days we are pulling all at once

This seems to work okay:

https://api.govdelivery.com/api/v2/accounts/11723/reports/bulletins?days=7&topic%5B%5D=454831&topic%5B%5D=449122&topic%5B%5D=449126&topic%5B%5D=452433&topic%5B%5D=444983&topic%5B%5D=444992&topic%5B%5D=447782&topic%5B%5D=449124&topic%5B%5D=452958

Returning 7 days at a time and adjusting the end-date should work. So you'd start with just days = 7 then you'd do days = 7 && end_date = TODAY() - 7 and so on.

So, really, we'd only have to call it 3 times.

omerbensaadon commented 5 years ago

Is there any way you could put a try && catch around this block of code? If it caches the response after we ask for it, that could solve the problem as well, but I'm not sure if the infrastructure allows for that

omerbensaadon commented 5 years ago

Updated Requirement


API Calls Used https://api.govdelivery.com/api/v2/accounts/11723/reports/bulletins

Parameters topic%5B%5D (values shown below)

&topic%5B%5D=289016&topic%5B%5D=454831&topic%5B%5D=449122&topic%5B%5D=449126&topic%5B%5D=452433&topic%5B%5D=444983&topic%5B%5D=444992&topic%5B%5D=447782&topic%5B%5D=449124&topic%5B%5D=452958

start_date end_date The above parameters for topics may change depending on the Product Owner's request. I don't see a need to finalize these topics before development begins, but if that is the case let me know.

We need data for the past 21 days but, as you know, we are dealing with very long load times when we try and retrieve this information a week at a time.

Instead, let's fetch it one day at a time.

Process for transferring data into Tableau Table When you fetch the data, it returns a JSON payload like so:

"bulletin_activity_details": [
        {
            "created_at": "2019-04-30T20:15:14.000Z",
            "subject": "Data Visualization Across the Nation",
            "to_text": "Subscribers of America Counts: Stories Behind the Numbers, Census Academy, Census Updates, Census Updates for Business, Data Visualization Newsletter, Directors Blog, Facts for Features, News about Business, Industry and Government Statistics, News about Housing Statistics, News about People Statistics, Press Kits, Random Samplings blog, Research Matters Blog, Stats for Stories, Tip Sheets, or Training Opportunities, where email contains \"@\"",
            "delivery_status_name": "Delivered",
            "addresses_count": 255417,
            "success_count": 247276,
            "failed_count": 8141,
            "percent_success": "96.812663213490096587",
            "immediate_email_recipients": 254750,
            "emails_delivered": 247276,
            "emails_failed": 8141,
            "percent_emails_delivered": "96.812663213490096587",
            "opens_count": 32393,
            "percent_opened": "13.099936912599686181",
            "nonunique_opens_count": 47457,
            "links_count": 27,
            "click_rate": "1.252042252381953768",
            "clicks_count": 3096,
            "nonunique_clicks_count": 5067,
            "shared_views": 104,
            "sender_email": "anthony.j.calabrese@census.gov",
            "digest_email_recipients": 667,
            "wireless_recipients": 0,
            "wireless_delivered": 0,
            "wireless_failed_count": 0,
            "bulletin_visibility?": "Public",
            "publish_to_facebook": "No",
            "publish_to_twitter": "No",
            "publish_to_rss?": "Yes",
            "wireless_unique_clicks": 0,
            "wireless_nonunique_clicks": 0,
            "facebook_nonunique_clicks": 0,
            "twitter_nonunique_clicks": 0,
            "_links": {
                "self": {
                    "href": "/api/v2/accounts/11723/reports/bulletins/37860974"
                },
                "topics": {
                    "href": "/api/v2/bulletins/37860974/topics"
                }
            }
        },
...

We only need the fields up to and including "digest_email_recipients" and want to format it in Tableau with the parameters as columns, like so:

Screen Shot 2019-05-01 at 3 54 44 PM

If we can insert all of the data this way with a time-stamp, it should be relatively straightforward to group by week in Tableau.

Is this possible? Let's assume the order of the variables within the JSON payload never change.

omerbensaadon commented 5 years ago

@loganpowell updated requirement above. Let me know if this doesn't work and we can do a quick call to figure out a solution...

loganpowell commented 5 years ago

Are the topics we're fetching these URL for the same as for #47 ?

omerbensaadon commented 5 years ago

@loganpowell Yes

loganpowell commented 5 years ago

This is done if the topics are the same as #47