Closed zmbc closed 4 months ago
@aflaxman is this an internal or 'real' request..? (or just a public documentation of a real, internal request! Which is awesome!)
It comes from inside the project, but it is a real request. Zeb is very motivated to figure out an appropriate way to use more than just the publicly available data in this workshop. I think this could be a good way to do it, but if this approach doesn't sound right to you, let's try to refine it.
Yes, this is real.
I have had some more ideas since I initially wrote this--I think what I would like to do is share the data via Google Drive, and most workshop participants would use this shared version directly via Google Colab. We would un-share it after the workshop.
We would still allow participants to download the data locally if they needed to use software besides Python, with instructions as above to delete afterwards.
Also, we could easily limit the data we share to be only the years and datasets that are necessary for the workshop, though I don't know how much this matters.
That makes sense to me; I think for transparency reasons we'd want to have public documentation (ideally in this thread) who the workshop participants and so potential accessers (sp) are.
@Ironholds I think one way to do it could be to put our "class roster" here on the day of the workshop. I'm not sure we'll know beforehand who will attend.
wfm!
Proposal approved, I'm closing this issue. :)
The workshop was a success! Here was our class roster (these people gave consent for it to be shared publicly):
Angeliki Evripidou, Youth Futures Foundation, Senior Analysis Officer. Carl Frederick, Institute for Research on Poverty, University of Wisconsin-Madison David Grenier, Dir. Data Engineering, Rhode Island Longitudinal Data System (RILDS) Xindi Hu, Principal Data Scientist, Mathematica Amy Krefman, Northwestern University Anders Alexandersson, Florida Cancer Registry Charlotte Ma, ICES. Tara Whitten, Senior Analyst, Provincial Research Data Services, Alberta SPOR Support Unit Fei Jiang, The Ohio state University Nan Wang, ICES Jeremy Foxcroft, PhD Candidate, University of Guelph Todd Abraham, Asst. Director Data & Analytics at I2D2, Iowa State University Jan Savinc, Research Fellow, Edinburgh Napier University & Scottish Centre for Administrative Data Research Rod Middleton, Associate Professor Disease Registers, Swansea University Yinshan Zhao, Sr Data Scientist, Popdata BC Claire Tochel, Research Fellow, University of Edinburgh Timothy Nielsen, Postdoc Researcher, University of Sydney Rui Wang, Senior Data Scientist, Mathematica Tom Prendergast, The Health Foundation Tetyana Perchyk, Research Fellow, University of Surrey Winnie Shen, ICES Joseph Lam, PhD Student/Research Assistant, University College London, UK Jose Nova, Assoc. Director, Data & Analytics Rutgers University - IPHD Evelyn Lauren, PhD candidate, Boston University Shih Hao Lee, Staff Data Scientist, Intuitive Surgical Lili Wei, Researcher, University of Glasgow Ayaz Hyder, Data and Integration Lead, Smart Columbus/Community Information Exchange; Associate Professor, College of Public Health, Ohio State University Susan Burtner, Research Associate, Northwestern University
[celebrate] Abraham D Flaxman reacted to your message:
From: Zeb Burke-Conte @.> Sent: Monday, September 30, 2024 7:39:31 PM To: ihmeuw/pseudopeople @.> Cc: Abraham Flaxman @.>; State change @.> Subject: Re: [ihmeuw/pseudopeople] [Data access request]: Workshop demonstrating linkage methods with large-ish data (Issue #394)
The workshop was a success! Here was our class roster (these people gave consent for it to be shared publicly):
Angeliki Evripidou, Youth Futures Foundation, Senior Analysis Officer. Carl Frederick, Institute for Research on Poverty, University of Wisconsin-Madison David Grenier, Dir. Data Engineering, Rhode Island Longitudinal Data System (RILDS) Xindi Hu, Principal Data Scientist, Mathematica Amy Krefman, Northwestern University Anders Alexandersson, Florida Cancer Registry Charlotte Ma, ICES. Tara Whitten, Senior Analyst, Provincial Research Data Services, Alberta SPOR Support Unit Fei Jiang, The Ohio state University Nan Wang, ICES Jeremy Foxcroft, PhD Candidate, University of Guelph Todd Abraham, Asst. Director Data & Analytics at I2D2, Iowa State University Jan Savinc, Research Fellow, Edinburgh Napier University & Scottish Centre for Administrative Data Research Rod Middleton, Associate Professor Disease Registers, Swansea University Yinshan Zhao, Sr Data Scientist, Popdata BC Claire Tochel, Research Fellow, University of Edinburgh Timothy Nielsen, Postdoc Researcher, University of Sydney Rui Wang, Senior Data Scientist, Mathematica Tom Prendergast, The Health Foundation Tetyana Perchyk, Research Fellow, University of Surrey Winnie Shen, ICES Joseph Lam, PhD Student/Research Assistant, University College London, UK Jose Nova, Assoc. Director, Data & Analytics Rutgers University - IPHD Evelyn Lauren, PhD candidate, Boston University Shih Hao Lee, Staff Data Scientist, Intuitive Surgical Lili Wei, Researcher, University of Glasgow Ayaz Hyder, Data and Integration Lead, Smart Columbus/Community Information Exchange; Associate Professor, College of Public Health, Ohio State University Susan Burtner, Research Associate, Northwestern University
— Reply to this email directly, view it on GitHubhttps://urldefense.com/v3/__https://github.com/ihmeuw/pseudopeople/issues/394*issuecomment-2384009090__;Iw!!K-Hz7m0Vt54!gMcTQw47ZtSJ56JCgqCIEqg-AQWIHgEYHeQwvvK6DWK0FWOWxPtaC10PvPuM9k5n4wMBGtghF3llaM433iCR$, or unsubscribehttps://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAAMQJATCG2NRRM3SYPASQLZZGSHHAVCNFSM6AAAAABDIYRIKSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBUGAYDSMBZGA__;!!K-Hz7m0Vt54!gMcTQw47ZtSJ56JCgqCIEqg-AQWIHgEYHeQwvvK6DWK0FWOWxPtaC10PvPuM9k5n4wMBGtghF3llaNFv60pR$. You are receiving this because you modified the open/close state.Message ID: @.***>
What is the name of your project?
Workshop demonstrating linkage methods with large-ish data
What is the purpose of your project?
We are considering hosting a workshop in which we demonstrate linkage with medium-size (~1 million row) datasets using different software packages. The aim is to show participants (who will be record linkage practitioners, such as social science researchers) how to use software they may not have used before, and compare the features of different tools. In order to do this in a workshop setting, we need some data that is big enough and isn't actually PII, but realistic. We think the RI data could be a great fit for this.
The linkages we do in the workshop with the pseudopeople-simulated data won't be the focus -- the real goal is for practitioners to apply the lessons they learn messing around with this data to their actual research questions.
Who is involved in the project? Which of these people will have direct access to the pseudopeople input data?
I already have access to the pseudopeople input data, as a member of the pseudopeople team 😃
The major data access request here would be to give workshop participants at a conference (temporary) access to the RI data for use during the workshop. I'm thinking we would frame it like so:
What funding is the project under? What expectations with respect to open access and access to data come with that funding?
Cooperative Agreement with the US Census Bureau, I don't believe there are any open data access requirements that go along with the funding
We commit to:
What data would you like to request?
Other data - more explanation
No response