Closed sjsrey closed 3 years ago
that's probably related to this recent fix in tobler. I'll investigate
Tobler seems ok. When I use the1990 and 2000 dataframes, the interpolation works for the intensive variables (with tobler not in geosnap)
i think this should be resolved with the newest fix to tobler, but i need to double check
i can confirm this is resolved with the latest dev version of tobler
import geosnap
Loading manifest: 100%|██████████| 5/5 [00:00<00:00, 14779.08entries/s]
Loading manifest: 100%|██████████| 5/5 [00:00<00:00, 18657.94entries/s]
/usr/local/anaconda3/envs/pysal/lib/python3.7/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
geosnap.__version__
'0.3.2'
sd = geosnap.Community.from_census(county_fips='06073')
/usr/local/anaconda3/envs/pysal/lib/python3.7/site-packages/pyproj/crs/crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
return _prepare_from_string(" ".join(pjargs))
import tobler
tobler.__version__
'0.3.1'
from tobler.area_weighted import area_interpolate
extensive = ['n_total_pop', 'n_total_housing_units', 'n_vacant_housing_units', 'n_black_persons', 'n_hispanic_persons' ]
intensive = ['median_household_income']
gdfs = [sd.gdf[sd.gdf.year==year] for year in [1990,2000,2010]]
sd1990, sd2000, sd2010 = gdfs
extensive = ["n_total_pop"]
intensive = ['median_household_income']
sd19902010 = area_interpolate(sd1990, sd2010, extensive_variables=extensive,
intensive_variables=intensive)
/Users/serge/Dropbox/p/pysal/src/subpackages/tobler/tobler/area_weighted/area_weighted.py:253: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
den = source_df["geometry"].area.values
sd19902010.shape
(627, 3)
sd1990.shape
(438, 195)
sd2010.shape
(627, 195)
sd19902010.head()
n_total_pop | median_household_income | geometry | |
---|---|---|---|
0 | 3780.045461 | 30027.580506 | POLYGON ((-117.01957 32.76373, -117.01562 32.7... |
1 | 4109.068579 | 27928.654437 | POLYGON ((-117.16864 32.74897, -117.16602 32.7... |
2 | 3778.913865 | 25588.558303 | POLYGON ((-117.14632 32.74842, -117.14250 32.7... |
3 | 3127.204323 | 16784.034521 | POLYGON ((-117.11577 32.75522, -117.11362 32.7... |
4 | 4431.874144 | 28626.872160 | POLYGON ((-117.37213 33.20012, -117.36902 33.2... |
sd20002010 = area_interpolate(sd2000, sd2010, extensive_variables=extensive,
intensive_variables=intensive)
/Users/serge/Dropbox/p/pysal/src/subpackages/tobler/tobler/area_weighted/area_weighted.py:253: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
den = source_df["geometry"].area.values
sd20002010.head()
n_total_pop | median_household_income | geometry | |
---|---|---|---|
0 | 3882.981322 | 40175.734975 | POLYGON ((-117.01957 32.76373, -117.01562 32.7... |
1 | 4258.772452 | 37318.052239 | POLYGON ((-117.16864 32.74897, -117.16602 32.7... |
2 | 4039.971454 | 35394.196027 | POLYGON ((-117.14632 32.74842, -117.14250 32.7... |
3 | 3873.042164 | 20704.730669 | POLYGON ((-117.11577 32.75522, -117.11362 32.7... |
4 | 5899.830878 | 36428.660886 | POLYGON ((-117.37213 33.20012, -117.36902 33.2... |
sd20002010.n_total_pop.sum()
2817686.995202498
sd2000.n_total_pop.sum()
2817687.0
# now with geosnap
sd_2010_gs = sd.harmonize(2010, extensive_variables=extensive,
intensive_variables=intensive)
/Users/serge/Dropbox/p/pysal/src/subpackages/tobler/tobler/area_weighted/area_weighted.py:253: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
den = source_df["geometry"].area.values
/Users/serge/Dropbox/p/pysal/src/subpackages/tobler/tobler/area_weighted/area_weighted.py:253: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
den = source_df["geometry"].area.values
/Users/serge/Dropbox/p/pysal/src/subpackages/tobler/tobler/area_weighted/area_weighted.py:253: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
den = source_df["geometry"].area.values
/Users/serge/Dropbox/p/pysal/src/subpackages/tobler/tobler/util/util.py:28: UserWarning: nan values in variable: median_household_income, replacing with 0
warn(f"nan values in variable: {column}, replacing with 0")
sd_2010_gs.gdf.head()
geoid | geometry | median_household_income | n_total_pop | year | |
---|---|---|---|---|---|
0 | 06073014901 | POLYGON ((-117.01957 32.76373, -117.01562 32.7... | NaN | 4156.000000 | 2010 |
1 | 06073000300 | POLYGON ((-117.16864 32.74897, -117.16602 32.7... | NaN | 4629.000000 | 2010 |
2 | 06073000800 | POLYGON ((-117.14632 32.74842, -117.14250 32.7... | NaN | 3964.000000 | 2010 |
3 | 06073002201 | POLYGON ((-117.11577 32.75522, -117.11362 32.7... | NaN | 3989.000000 | 2010 |
4 | 06073018509 | POLYGON ((-117.37213 33.20012, -117.36902 33.2... | NaN | 5325.999683 | 2010 |
I cant reproduce this locally. With the latest development version of tobler (master on pysal/tobler) installed in my geosnap environment, I get the following
import geosnap
/Users/knaaptime/Dropbox/projects/geosnap/geosnap/_data.py:123: UserWarning: Unable to locate local census data. Streaming instead.
If you plan to use census data repeatedly you can store it locally with the io.store_census function for better performance
"Unable to locate local census data. Streaming instead.\n"
Loading manifest: 100%|██████████| 5/5 [00:00<00:00, 7327.58entries/s]
Loading manifest: 100%|██████████| 5/5 [00:00<00:00, 5726.79entries/s]
sd = geosnap.Community.from_census(county_fips='06073')
/Users/knaaptime/anaconda3/envs/geosnap/lib/python3.7/site-packages/pyproj/crs/crs.py:53: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
return _prepare_from_string(" ".join(pjargs))
extensive = ['n_total_pop', 'n_total_housing_units', 'n_vacant_housing_units']
intensive = ['median_household_income', 'p_poverty_rate']
sd_2010 = sd.harmonize(2010,extensive_variables=extensive,
intensive_variables=intensive)
/Users/knaaptime/anaconda3/envs/geosnap/lib/python3.7/site-packages/tobler-0.3.1-py3.7.egg/tobler/area_weighted/area_weighted.py:253: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
den = source_df["geometry"].area.values
/Users/knaaptime/anaconda3/envs/geosnap/lib/python3.7/site-packages/tobler-0.3.1-py3.7.egg/tobler/util/util.py:28: UserWarning: nan values in variable: p_poverty_rate, replacing with 0
warn(f"nan values in variable: {column}, replacing with 0")
/Users/knaaptime/anaconda3/envs/geosnap/lib/python3.7/site-packages/tobler-0.3.1-py3.7.egg/tobler/area_weighted/area_weighted.py:253: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
den = source_df["geometry"].area.values
/Users/knaaptime/anaconda3/envs/geosnap/lib/python3.7/site-packages/tobler-0.3.1-py3.7.egg/tobler/util/util.py:28: UserWarning: nan values in variable: p_poverty_rate, replacing with 0
warn(f"nan values in variable: {column}, replacing with 0")
/Users/knaaptime/anaconda3/envs/geosnap/lib/python3.7/site-packages/tobler-0.3.1-py3.7.egg/tobler/area_weighted/area_weighted.py:253: UserWarning: Geometry is in a geographic CRS. Results from 'area' are likely incorrect. Use 'GeoSeries.to_crs()' to re-project geometries to a projected CRS before this operation.
den = source_df["geometry"].area.values
/Users/knaaptime/anaconda3/envs/geosnap/lib/python3.7/site-packages/tobler-0.3.1-py3.7.egg/tobler/util/util.py:28: UserWarning: nan values in variable: median_household_income, replacing with 0
warn(f"nan values in variable: {column}, replacing with 0")
/Users/knaaptime/anaconda3/envs/geosnap/lib/python3.7/site-packages/tobler-0.3.1-py3.7.egg/tobler/util/util.py:28: UserWarning: nan values in variable: p_poverty_rate, replacing with 0
warn(f"nan values in variable: {column}, replacing with 0")
sd_2010.gdf.plot('n_vacant_housing_units')
<AxesSubplot:>
sd_2010.gdf.plot('p_poverty_rate')
<AxesSubplot:>
sd_2010.gdf.plot('median_household_income').plot()
[]
I did a clean clone of both geosnap and tobler for this. Could there be something in your geosnap that is not upstream?
no i just installed it from master
cd geosnap; conda env create -f environment.yml
conda activate geosnap; python setup.py install
pip uninstall tobler -y # uninstall conda version first
cd ../tobler
python setup.py install # install current master
This is in a clean clone of geosnap after conda env create -f environment.yml
(base) ~/D/g/g/s/geosnap ❯❯❯ conda activate geosnap
(geosnap) ~/D/g/g/s/geosnap ❯❯❯ python setup.py install
Traceback (most recent call last):
File "setup.py", line 9, in <module>
with open("README.md", encoding="utf8") as file:
TypeError: 'encoding' is an invalid keyword argument for this function
What version of python do you have locally (since it isn't spec'd in the environment.yml file?)
/Users/knaaptime/Dropbox/projects/geosnap master* ⇡ 19s
geosnap ❯ ipython
Python 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:37:09)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.17.0 -- An enhanced Interactive Python. Type '?' for help.
yah but there is no ipython in that environment.yml
i had to install ipykernel manually so that jupyter would see it
After starting completely over, as in fresh clones into new directories, and repeating these steps:
cd geosnap; conda env create -f environment.yml
conda activate geosnap; python setup.py install
pip uninstall tobler -y # uninstall conda version first
cd ../tobler
python setup.py install # install current master
I'm still getting the nan for the intensive variables when using geosnap but not tobler.
It turns out, plotting works even with nan values. So can you check the head to see if you are getting nans?
Is something getting duplicated?
was just going through the smae thing. I have lots of nans but lots of values
need to look into the harmonize code closer
was also wondering how the tests could be passing
i think i see whats going on
My hunch is it is in the geosnap harmonization as tobler warns it encountered NANs and has replaced them with 0s. So the NANs we are seeing here are likely coming from some operation in the harmonize method.
5 rows × 195 columns