metadatacenter-attic / phs-gdc

PHS-GDC Prototype
1 stars 0 forks source link

Create an 'available data' table #25

Closed graybeal closed 3 years ago

graybeal commented 3 years ago

Is your feedback related to a problem? If so please describe. We need a way for users to look up what data is available for what location types, and to link to it from the DCW.

Describe the idea you have or solution you'd like to see A description of what you think could happen or want to happen. Create a google sheet showing the status relationships between statistic variables and location types. As a first cut just showing whether there are any variables accessed by a given location type is sufficient.

Describe alternatives you've considered A clear and concise description of any alternative ideas or features you've considered. Expressing the relationship as a percentage—how many of the location instances have data values for a given statistic variable and location types, vs how many location instances there are—would be icing on the cake.

Additional context Add any other context or screenshots about the feedback here. Don't get distracted by the perfect or long-term solution, having the basic filled-out table is plenty good enough for now. Pass along to Marcos any lessons that could apply to the long-term in-app solution, though. (See issue #16)

johardi commented 3 years ago

As promised,

Statistical Variable Availability.xlsx

Let me know what I can improve before I export it to Google Sheet

johardi commented 3 years ago

The method to produce the Excel spreadsheet:

  1. List all the DC's statistical variables as a new sheet called "All Statvars"
  2. Using the DC API, list all the DC's statistical variables for the location code: zip/90249 as a new sheet called "ZIP Code"
    curl --request GET \
    --url 'https://api.datacommons.org/place/stat-vars?dcids=zip%2F90249'
  3. Merge the two sheets using the Excel formulas INDEX and MATCH (see the spreadsheet to learn more about the formula) and store the output in the "All Statvars" sheet. The black circle "⬤" indicates a match was found.

Repeat Step (2) and (3) for other location types. Below are the details:

Known limitation of the method The spreadsheet may not show the complete availability information because it is based on some arbitrary sampling of the locations.

johardi commented 3 years ago

The public spreadsheet is available at https://docs.google.com/spreadsheets/d/1s7jurDfn-c9iHyNQ6QnfnPGYV6j6bd75cF3fbvwf9Lc

graybeal commented 3 years ago

If adding other content tabs, after getting the response, edit it to:

Other stuff to do to complete the spreadsheet update:

Confirm that the new calculations and charts make sense, and that you really can get data from one of those locations (or better, from all the locations in a state) for a few of those statistical variables, by querying the actual Wizard.