operate-first / operate-first-data-science-community

GitHub home of operate first data science community content
https://operate-first-data-science-community.catalog.meteor.zone/
GNU General Public License v3.0
8 stars 12 forks source link

Automate EDA notebook for meetup attendees data #84

Closed suppathak closed 1 year ago

suppathak commented 2 years ago

For creating a self updating EDA notebook, we will have to create a cron job that runs biweekly and updates the notebook for new meetup attendees data.

suppathak commented 2 years ago

@aakankshaduggal The data obfuscation technique, which is applied in regular basis for regular influx of data every biweekly, is done internally. The process is not shared with others, keeping attendees data security at high priority. Once we have the obfuscated data, it can be visualized through google analytics. Hence, after giving some thought, does it makes sense to explain the steps of obfuscation process in a notebook instead of doing cron job. Lmk wyt?

aakankshaduggal commented 2 years ago

I think it is vital that we document this process well enough for the new organizers to be able to re-create this pipeline to anonymize the attendees' data. Let's look at it this way, instead of creating a cron job for this. We should create proper documentation for recreating this process. Starting from receiving the attendees' list from the organizer, and the corresponding hash values, then anonymizing by using the obfuscation notebook, and finally storing all this data and further using it for creating a dashboard for the community metrics. I can help you with this documentation by covering the latter part of the metrics and dashboard.

Please let me know if you need any other assistance from my end.

Does this sound reasonable @MichaelClifford ?

sesheta commented 2 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

sesheta commented 2 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

sesheta commented 1 year ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close

sesheta commented 1 year ago

@sesheta: Closing this issue.

In response to [this](https://github.com/operate-first/operate-first-data-science-community/issues/84#issuecomment-1326889171): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. > >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.