scientist-softserv / palni-palci

Other
1 stars 0 forks source link

Resolve Sending Google Analytics Data for Tenant and Proprietor #910

Closed jillpe closed 10 months ago

jillpe commented 11 months ago

Summary

(Updated and revised 2023-12-08)

Goal: For each page to send data to Google Analytics; either the proprietor only or the proprietor and tenant (see Use Cases).

We want:

Use Cases

Use Cases/Acceptance Criteria ```gherkin Given I'm on a proprietor only page When I inspect the source Then I will see a google analytics section for the proprietor's analytics id ``` ```gherkin Given I'm on a tenant page And the tenant has configured their analytics ID When I inspect the source Then I will see a google analytics section for the proprietor's analytics id And I will see the google analytics section for the tenant's analytics id ``` ```gherkin Given I'm on a tenant page And the tenant has NOT configured their analytics ID When I inspect the source Then I will see a google analytics section for the proprietor's analytics id And I will not see the google analytics section for the tenant's analytics id ``` ```gherkin Given I'm on a tenant page And the tenant has configured their analytics ID And the tenant's analytics ID is the same as the proprietors When I inspect the source Then I will see only one google analytics section for that singular analytics id ``` *The above handles not needing to worry too much about the data migration to rollback changes to the `AccountSettings`.*

Tasks

A cautionary note: at present, the tenant's have Analytics ID along with secrets. It is highly likely that the Analytics ID and secrets are duplicated (and are in fact the Proprietor's information).

Breakdown

We want Hyrax::Analytics.config.analytics_id to always be the proprietor analytics id. In doing this, we'll ensure that future work regarding integration with Google Analytics version 4 will be the least disruptive. Further this better aligns with Hyrax assumptions.

At the proprietor-level, we need to be able to provide:

At the tenant-level we should only expose google_analytics_id.

We will almost certainly want to undo most of the work from PR #479; in particular the switching logic for amending the Hyrax.config and Hyrax::Analytics.config:

Acceptance Criteria

Testing Instructions

See above "Use Case/Acceptance Criteria" for testing instructions.

User Documentation

The proprietor analytics are configured on the server via Environment variables. To change these requires coordination with SoftServ.

To send data to Google Analytics requires a Google Analytics ID.

To fetch data from Google Analytics requires creating a service account with Google and adding the service account keys to the server.

See Hyku README’s Google Section for details regarding the service accounts.

With credentials set, Hyku will send analytics data to Google. And when the community completes Hyrax Google 4 Analytics, it will be able to pull Google analytics data into the Hyku dashboard for all tenants.

The tenant analytics are set via a tenant administrator in the Configuration > Settings > Account form. Set the “Google Analytics” field to the tenant provided Google Analytics ID (e.g. G-10CHARLONG).

jillpe commented 11 months ago

Is having it report to GA sufficient, or are you wanting the analytics summary to display in Hyrax? Only getting it to report to GA is the easier lift (about a 5). If you are wanting to be able to look at the analytics summary in Hyrax that will be a bigger lift (multiple sprints). I know we'd talked about setting aside additional analytics works for a separate sprint, so the bigger lift may be what we set aside

aahurford commented 11 months ago

We are ok with getting just the GA working for now; My understanding is that we are waiting for in-app work in Hyrax to be completed

aahurford commented 11 months ago

to clarify -- We have google analytics for the whole repo currently. We would additionally like our tenants to be able to create and manage their own per-tenant GA

jeremyf commented 11 months ago

@aahurford To clarify there is presently a G-3TWN24N1XN on the proprietor (e.g. https://hykucommons.org) and all of the tenants. It sounds like you would like each tenant to be able to add another analytics code that is specific to their organization (e.g. Goshen College has an analytics ID that they use for their website and want to use for their hykucommons.org tenant).

Am I understanding that correctly?

Also, if the tenant adds their GA code, it does not clobber/replace the proprietor GA code for that tenant.

aahurford commented 11 months ago

@jeremyf Yep! It's my understanding that British Library has something similar implemented. The way we are doing it currently makes it difficult to retrieve stats at the tenant level.

ShanaLMoore commented 11 months ago

cc @laritakr not sure if this will help but I captured this screenshot during yesterday's meeting. It may be relevant to this work?. I asked Rob why this project had its own hyrax branch and if we could/should switch them back:

Image

jeremyf commented 11 months ago

tl;dr The Hyrax branch is a bit of a red herring.

Given that the request is for a per tenant Google Analytics And Hyrax has no concept of tenants Then this is not a problem that different branches/versions of Hyrax would solve.

jeremyf commented 11 months ago

In discussion with @laritakr, there's https://github.com/scientist-softserv/palni-palci/pull/479 to consider. It appears to implement per tenant google analytics but at the cost of proprietor level analytics.

However, it does inform how we might go about doing this.

jeremyf commented 11 months ago

ATTN @aahurford, please read the following to see if it makes sense.

There are two functions for Google Analytics in Hyrax

  1. Sending analytic data to Google, via the google analytics ID. This data is then visible in https://analytics.google.com/analytics/web/
  2. Fetching analytic data from Google, vai the google analytics auth keys. When this is working, the analytics data sent to Google is visible in Hyrax (that is a pending bug fix upstream).

For Issue #910 we are looking to have functionality where:

  1. We send Google analytics data for both the proprietor and the tenant; that requires two separate google analytics IDs.
  2. Eventually, when the fetching function is working, we would like to provide only 1 set of credentials across the whole Hyku system (tenants and prorpietor); as those credentials are a bit more complicate to generate and manage. Something we don’t envison most tenants doing; or frankly needing to do. Especially given that step 1 is to be in place.

In our investigation, we’ve discovered that there is presently a per-tenant configuration of Google Analytics (but no longer a proprietor level configuration). That was brought about by PR#479 addressing Issue #408.

It appears that the Google Analytics ID of the tenants (as well as the google analytics auth keys) are copied to each of the tenants; that is each tenant has the same Google Analytics ID.

The proposed solution is:

  1. Restore proprietor based analytics that sends data to Google and likewise fetches.
  2. Add a tenant setting for tenant_google_analytic_id; that id would only be used for sending analytic data to Google.

The above proposal would then ensure that when GA4-fetch integration is completed, all tenants can fetch data from google (using the proprietor’s provide auth keys). And both proprietor and tenant would have their respective information shown in the Google Analytics. This would alleviate the need for each tenant to provide an OAUth configuration for fetching data.

When a tenant chose not to specify a google analytics ID, then they would not be able to go to https://analytics.google.com/analytics/web/ to see information about their tenant. However, once the fetch integration worked, they would piggyback retrieving information from Google to show in the Hyrax native analytics environment.

Note: This assumes that tenants are not able to opt out of the proprietor sending analytics to Google.

aahurford commented 11 months ago

I think this plan makes sense, but I wanted to clarify that we are currently gathering data for all tenants in the main account (GA-4 285666830) -- the problem is that there is no easy way to pull data for a single tenant.

We previously explored the option of adding sub accounts under the main one to achieve this but abandoned that plan due to lack of documentation for how exactly to do it. We are now pursuing this direction after hearing from Jenny at British Library that they have something similar set up.

So, it sounds like we are on the same page, but I want to make sure we are pursuing the least complicated path at achieving both overall and tenant level analytics, and however will be of best use to the community.

Best case scenario is that tenant admins can access their stats on their own -- but not entirely necessary if that makes the work terribly more complex. I also want to make sure that British Library's setup is reviewed to avoid reinventing the wheel.

jeremyf commented 11 months ago

@aahurford I'm asking @j-basford for some clarification. What I'm seeing is that they have a per tenant analytics configuration; they also send (through a different mechanism) data to Iris (at a proprietor level, but likely with tenant level segmentation).

Which would mean that:

  1. In Google Analytics, the proprietor cannot see traffic of any of the tenants
  2. They likely can see analytics in Iris for all tenants; as the configuration is on a proprietor basis

This would mean then deviate from future Hyrax Google Analytics 4 work; because we'd lose the ability for the proprietor to fetch analytics of all tenants (requiring each tenant to configure OAuth keys to fetch that information into Hyrax).

It is possible that Iris would be suitable, and with the COUNTER metrics work for Pitt, however we introduce a mix of analytics down the road that will require notable refactoring; Hyrax assumes that it's sending analytics to the same place that it will later fetch from.

So, I'm still thinking through this and will bring it to the team for further discussion/consideration.

jeremyf commented 11 months ago

From @j-basford:

Hi Jeremy, we have a GA tag that we can look at for everyone's tenant, which I think is in-built into our page setup. then we can also go in the superadmin view and add a tag that each tenant supplies, and they have to look at that themselves.

The BL uses a company called Crafted to do all our GA monitoring and they create us nice dashboards to monitor; they simply added the repositories to the list. I (and Graham and Nora) can only see the repository figures, but the BL web team can see them too, plus all the other BL websites that they monitor. My understanding is that our partner tenants have a similar set up and their web teams give them the tag to use and they monitor it in-house.

Happy to show you and Amanda what it looks like in our GA dashboard if that helps. We do also use IRUS, Graham is much more au fait with those figures than I am as he tested those and is requesting some tweaks on it with the IRUS-UK team right now

jeremyf commented 11 months ago

@aahurford to summarize, the pathway forward is to have two analytics codes on each page. One proprietor and one tenant. However, there are still issues to resolve (see Discussion)

Discussion

This resolves sending the data to Google. The next issue is getting the data out of Google; in particular with a focus on tenant level segmentation. There are two places and two "users" that the data:

The places are:

  1. Google Analytics
  2. Querying Google Analytics to fetch that data into Hyrax; a function that is not yet working.

The users are:

  1. Properietor admins (e.g. who provided the GA token for the proprietor)
  2. Tenant admins (e.g. who provided the GA token for the tenant)

With the above, I'm wondering if you can add hostName to Google Analytics dashboard. I found that while working on the Sushi counter metrics (here's the related code documentation regarding hostName dimension).

The Querying Google Analytics is a known issue for Hyrax; and would be part of Hyrax 5 (with a possible backport to Hyrax 4). Put another way, this is further down the horizon.

The Iris pathway that BL took, is a different consideration as well. Namely, it hooks into another service and sends that data. And we wouldn't be able to back fill that.

Since the proprietor's analytics is used on each tenant; that means the tenant does not have access (at this time) to Google Analytics for their tenant (unless you share the properietor analytics information; and provide a dimenion query with hostName).

Put another way, due to the configuration reality, the best way to ensure we have historic data prior to any changes (which we should make) is to continue with sending properietor analytics; but move that information back to the proprietor level then allow each tenant to add their own analytics. This would mean the historical data is available, but would only be visible to the tenant when one of three things becomes true:

  1. The proprietor Google analytics are shared with/reported to the tenant; this assumes that the hostName dimension can be added.
  2. The aforementioned issue in Hyrax is resolved
  3. We complete the Sushi metrics work to populate that information; the queries and approach for populating Sushi counter-metrics are different from the aforementioned Hyrax issue. a. The querying of data from GA for Sushi is done on a date range basis. b. The querying of data from GA for Hyrax analytics is done on a work basis.
aahurford commented 11 months ago

Thank you for your thoughtful examination of this, Jeremy!

ShanaLMoore commented 11 months ago

TODO: @jeremyf to do a writeup of tasks to accomplish this

laritakr commented 11 months ago

This does not pass QA. The tenant only showed their own ID, and not the proprietor's ID.

ShanaLMoore commented 11 months ago

QA RESULTS: Pass ✅

cc @ndroark ready for your re-review

Tested on: Staging

Use Cases

1.

Given I'm on a proprietor only page
When I inspect the source
Then I will see a google analytics section for the proprietor's analytics id

Image

2.

Given I'm on a tenant page
And the tenant has configured their analytics ID
When I inspect the source
Then I will see a google analytics section for the proprietor's analytics id
And I will see the google analytics section for the tenant's analytics id

Image

3.

Given I'm on a tenant page
And the tenant has NOT configured their analytics ID
When I inspect the source
Then I will see a google analytics section for the proprietor's analytics id
And I will not see the google analytics section for the tenant's analytics id

Image

4.

Given I'm on a tenant page
And the tenant has configured their analytics ID
And the tenant's analytics ID is the same as the proprietors
When I inspect the source
Then I will see only one google analytics section for that singular analytics id

Image