pseudopeople is a Python package that generates realistic simulated data about a fictional United States population, designed for use in testing entity resolution (record linkage) methods or other data science algorithms at scale.
We are developing methods to validate data, align schema across agencies and geographies, conduct person-level data linkages, and produce synthetic tabular output about educational attainment, employment, and earnings.
Who is involved in the project? Which of these people will have direct access to the pseudopeople input data?
The project team within the Massive Data Institute will access the Pseudopeople input data. We will subset, recode, transform, and augment records to meet our simulation needs. Project team: Amy O’Hara, James Carey, Stephanie Straus, Maanasa Vatsavayi, Kangheng Liu, Steve Kent, Justin Liu, Victor Chen.
What funding is the project under? What expectations with respect to open access and access to data come with that funding?
Our project is funded by several grants from the Bill & Melinda Gates Foundation, the Eric & Wendy Schmidt Fund for Strategic Innovation, and the Walton Family Foundation. Each of these grants require an open publication of our project's final results. The Gates Foundation funding in particular requires open access to the underlying data so long as that access is not ethically unsound or legally encumbered. We believe the principles of the Pseudopeople data clearly articulate why public sharing of the dataset may be ethically unsound and we can fulfill our openness obligations for this funding by meeting the FAIR Data Principles of directing our audience to request their own access to the Pseudopeople data for data reuse. Any reproduction of the Pseudopeople data in publications resulting from this project will be a small subset of the data used for illustrative purposes demonstrating our matching process.
We commit to:
[X] be responsive to further questions from interested parties
[X] deprecate and replace our version of the pseudopeople input data when a new version is released
What is the name of your project?
Statistical/Secure Query Service
What is the purpose of your project?
We are developing methods to validate data, align schema across agencies and geographies, conduct person-level data linkages, and produce synthetic tabular output about educational attainment, employment, and earnings.
Who is involved in the project? Which of these people will have direct access to the pseudopeople input data?
The project team within the Massive Data Institute will access the Pseudopeople input data. We will subset, recode, transform, and augment records to meet our simulation needs. Project team: Amy O’Hara, James Carey, Stephanie Straus, Maanasa Vatsavayi, Kangheng Liu, Steve Kent, Justin Liu, Victor Chen.
What funding is the project under? What expectations with respect to open access and access to data come with that funding?
Our project is funded by several grants from the Bill & Melinda Gates Foundation, the Eric & Wendy Schmidt Fund for Strategic Innovation, and the Walton Family Foundation. Each of these grants require an open publication of our project's final results. The Gates Foundation funding in particular requires open access to the underlying data so long as that access is not ethically unsound or legally encumbered. We believe the principles of the Pseudopeople data clearly articulate why public sharing of the dataset may be ethically unsound and we can fulfill our openness obligations for this funding by meeting the FAIR Data Principles of directing our audience to request their own access to the Pseudopeople data for data reuse. Any reproduction of the Pseudopeople data in publications resulting from this project will be a small subset of the data used for illustrative purposes demonstrating our matching process.
We commit to:
What data would you like to request?
Other data - more explanation
No response