socialfoundations / folktables

Datasets derived from US census data
MIT License
234 stars 20 forks source link

Little refactor and minor bug #41

Open baraldian opened 5 months ago

baraldian commented 5 months ago

fix: refactor fillna and mobility_filter lambda functions with functions, fillna_safe also applies fillna from pandas to solve multiple nan encodings; creating directory for cached data when not available.

jenno-verdonck commented 5 months ago

Could the proposed fix for multiple nan encoding also help in the generate_categories function? Currently the code uses a placeholder value (e.g. -99999999999999.0) because using a nan value in a python dictionary doesn't really work as each nan is seen as a different key.

baraldian commented 5 months ago

Could the proposed fix for multiple nan encoding also help in the generate_categories function? Currently the code uses a placeholder value (e.g. -99999999999999.0) because using a nan value in a python dictionary doesn't really work as each nan is seen as a different key.

I'm sorry, but given that I'm external to the project, I'm fixing only the things that are giving problems to my experiments.

mrtzh commented 4 months ago

See also discussion in https://github.com/socialfoundations/folktables/issues/39 about the nan_to_num problem.