ihmeuw / pseudopeople

pseudopeople is a Python package that generates realistic simulated data about a fictional United States population, designed for use in testing entity resolution (record linkage) methods or other data science algorithms at scale.
https://pseudopeople.readthedocs.io
BSD 3-Clause "New" or "Revised" License
18 stars 2 forks source link

DC ETEP #456

Open mv802 opened 6 days ago

mv802 commented 6 days ago

What is the name of your project?

Statistical/Secure Query Service

What is the purpose of your project?

We are developing methods to validate data, align schema across agencies and geographies, conduct person-level data linkages, and produce synthetic tabular output about educational attainment, employment, and earnings.

Who is involved in the project? Which of these people will have direct access to the pseudopeople input data?

The project team within the Massive Data Institute will access the Pseudopeople input data. We will subset, recode, transform, and augment records to meet our simulation needs. Project team: Amy O’Hara, James Carey, Stephanie Straus, Maanasa Vatsavayi, Kangheng Liu, Steve Kent, Justin Liu, Victor Chen.

What funding is the project under? What expectations with respect to open access and access to data come with that funding?

Our project is funded by several grants from the Bill & Melinda Gates Foundation, the Eric & Wendy Schmidt Fund for Strategic Innovation, and the Walton Family Foundation. Each of these grants require an open publication of our project's final results. The Gates Foundation funding in particular requires open access to the underlying data so long as that access is not ethically unsound or legally encumbered. We believe the principles of the Pseudopeople data clearly articulate why public sharing of the dataset may be ethically unsound and we can fulfill our openness obligations for this funding by meeting the FAIR Data Principles of directing our audience to request their own access to the Pseudopeople data for data reuse. Any reproduction of the Pseudopeople data in publications resulting from this project will be a small subset of the data used for illustrative purposes demonstrating our matching process.

We commit to:

What data would you like to request?

Other data - more explanation

No response

aflaxman commented 4 days ago

This seems sound to me. @Ironholds : do you want clarification about anything here?