ipums / ipumsr

Request, download, and read IPUMS data in R
https://tech.popdata.org/ipumsr/
Mozilla Public License 2.0
20 stars 4 forks source link

Is there a reference mapping the sample name code to the sample dataset name? #53

Closed hessakh closed 11 months ago

hessakh commented 11 months ago

Is there a reference mapping the sample name code to the sample dataset name? I don't see it in the vignette. Thank you!

robe2037 commented 11 months ago

Hi @hessakh, if you're referring to a mapping between the sample codes used by the IPUMS API and the more human-readable names of those samples, you can use get_sample_info(). Sample codes and names are available for the currently supported microdata collections (“usa”, “cps”, and “ipumsi”). See the Microdata API Requests vignette for an example.

The IPUMS API doesn’t yet support more extensive metadata access for microdata collections, but this functionality is slated to be added in the future.

hessakh commented 11 months ago

Thank you! Do you by any chance know where the ACS5 2021 sample falls under?

robe2037 commented 11 months ago

For 2021 ACS 5-year microdata, you'll find the sample in IPUMS USA. Take a look at the IPUMS USA website for more details about the data it provides.

If you need additional help identifying data for your needs, check out the IPUMS Forum.

hessakh commented 11 months ago

Thank you, I know I can find it there. But I would like the code name it’s under to use it in the define_extract_usa() function. Is it not possible to access this data from the R library?

On 26 Sep 2023, at 3:07 PM, Finn Roberts @.***> wrote:

For 2021 ACS 5-year microdata, you'll find the sample in IPUMS USA. Take a look at the IPUMS USA website https://usa.ipums.org/usa/ for more details about the data it provides.

If you need additional help identifying data for your needs, check out the IPUMS Forum https://forum.ipums.org/.

— Reply to this email directly, view it on GitHub https://github.com/ipums/ipumsr/issues/53#issuecomment-1736132305, or unsubscribe https://github.com/notifications/unsubscribe-auth/AITKIUIKUTAWXTMGYRDG6B3X4MRX7ANCNFSM6AAAAAA5IA3R7M. You are receiving this because you were mentioned.

robe2037 commented 11 months ago

As mentioned above, you can view a listing of all sample codes for IPUMS USA by using get_sample_info(). For instance:

samples <- get_sample_info("usa")

samples
#> # A tibble: 150 × 2
#>   name    description                             
#>   <chr>   <chr>                                   
#> 1 us1850a 1850 1%                                 
#> 2 us1850b 1850 100% sample (July 2015)            
#> 3 us1850c 1850 100% sample (Revised December 2017)
#> 4 us1860a 1860 1%                                 
#> 5 us1860b 1860 1% sample with black oversample    
#> 6 us1860c 1860 100% sample (Jan 2019)             
#> 7 us1870a 1870 1%                                 
#> 8 us1870b 1870 1% sample with black oversample    
#> 9 us1870c 1870 100% sample (Jan 2019)             
#> 10 us1880a 1880 1%                                 
#> # ℹ 140 more rows
#> # ℹ Use `print(n = ...)` to see more rows

The name column contains the code that you would use in define_extract_usa(). The description column gives a short description of the sample associated with the indicated code.

If you're looking for 2021 ACS data, look for a description that includes those keywords. You can do this manually or by filtering through the table:

samples[grepl("2021 ACS", samples$description), ]
#> # A tibble: 1 × 2
#>  name    description
#>  <chr>   <chr>      
#> 1 us2021a 2021 ACS   

So, you would use "us2021a" in the samples field of define_extract_usa().

Note that this assumes you have set up your API key and are registered for IPUMS USA. For more about setting up your API key, see the corresponding section in the Introduction to the API.

hessakh commented 11 months ago

Ok I see. I used print(samples[grepl("ACS 5-year", samples$description), ]) to get the ACS 5 2021 which is different from the ACS 2021. And got the following:

`# A tibble: 13 × 2
   name    description          
   <chr>   <chr>                
 1 us2009e 2005-2009, ACS 5-year
 2 us2010e 2006-2010, ACS 5-year
 3 us2011e 2007-2011, ACS 5-year
 4 us2012e 2008-2012, ACS 5-year
 5 us2013e 2009-2013, ACS 5-year
 6 us2014c 2010-2014, ACS 5-year
 7 us2015c 2011-2015, ACS 5-year
 8 us2016c 2012-2016, ACS 5-year
 9 us2017c 2013-2017, ACS 5-year
10 us2018c 2014-2018, ACS 5-year
11 us2019c 2015-2019, ACS 5-year
12 us2020c 2016-2020, ACS 5-year
13 us2021c 2017-2021, ACS 5-year`

so the code name I was looking for is 'us2021c' for 5-year ACS 2021.