ipums / ipumspy

Mozilla Public License 2.0
8 stars 6 forks source link

`get_extract_by_id()` always returns default values for camelCase extract fields #99

Open robe2037 opened 1 month ago

robe2037 commented 1 month ago

The MicrodataExtract() class expects arguments in snake case. However, because get_extract_info() returns the extract definition using the API's camel case conventions, the following lines in get_extract_by_id() end up passing all camel case arguments to **kwargs instead of the appropriate snake case arguments in MicrodataExtract().

extract_def = self.get_extract_info(extract_id, collection)
if "microdata" in BaseExtract._collection_type_to_extract:
    extract = MicrodataExtract(**extract_def["extractDefinition"])

For instance, an extract with a non-default data_structure will be reverted to a rectangular-on-P data structure because its value after get_extract_info() is stored in the dataStructure field of the response. Here's a reproducible example:

from ipumspy import *
import os

client = IpumsApiClient(os.environ.get("IPUMS_API_KEY"))

ext1 = MicrodataExtract(
    "usa",
    ["us2017a"],
    ["AGE"],
    data_structure = {"hierarchical": {}}
)

client.submit_extract(ext1)

ext2 = client.get_extract_by_id(ext1.extract_id, "usa")

print(ext1.data_structure) # {"hierarchical": {}}
print(ext2.data_structure) # {"rectangular": {"on": "P"}}
robe2037 commented 1 month ago

We should be able to resolve this simply by using extract_from_dict() to convert from the dictionary to an extract object. extract_from_dict() already handles case conversion.