socialfoundations / folktables

Datasets derived from US census data
MIT License
240 stars 19 forks source link

Refactored the ACSDataSource #31

Closed RicardoJSandoval closed 10 months ago

RicardoJSandoval commented 2 years ago

In this pull request I'm introducing two main things: (1) a refactored ACSDataSource, (2) tests for the ACSDataSource.

Refactored ACSDataSource

The idea behind refactoring the ACSDataSource object is to encapsulate the functionalities and dependencies specific to the ACS datasets into one file & object. The refactored ACSDataSource now uses the utility functions I had previously created. With this, we now have a pattern to follow in case we want to extend folktables to support other datasets.

Tests for the ACSDataSource

I created some tests that check that the ACSDataSource (down)loads different datasets as expected. I created some artificial datasets that contain the same columns as the corresponding, original, datasets, but contain less rows and random values. This smaller, artificial, datasets reduce the overhead of running the tests. Furthermore, I've written this tests such that they don't make an HTTP request to the Census Bureaus' website every time they're run, but rather we capture the requests and return the artificial datasets. The only exception to this is the test test_definitions_download which I directly copied from a previous merge (this test does make an HTTP request to the actual website).

RicardoJSandoval commented 2 years ago

Hi @jenno-verdonck! Thanks for the comments and catching those typos. I'm also not entirely sure which is the best way to proceed with testing, so I'll leave it up to the others to decide and I'm happy to make any changes.

mrtzh commented 1 year ago

Hi @RicardoJSandoval - I was curious if you'd still like to move ahead with this pull request. I recall that the goal is to add a SIPP data source and this is an intermediate step.

If we're going ahead with this, could you update the version number to '0.13' in setup.py and folktables/__init__.py?

On another note, is this update fully backwards compatible with the existing ACS code?

Thanks!