mggg / maup

The geospatial toolkit for redistricting data.
https://maup.readthedocs.io/en/latest/
MIT License
67 stars 23 forks source link

ValueError: cannot reindex from a duplicate axis #49

Open dinosg opened 3 years ago

dinosg commented 3 years ago

then maup.assign just crashes... after spending a while getting thru the assignments. example:

In [10]: assign1 = maup.assign(blocks20, vtds10) 100%|██████████| 8941/8941 [11:36<00:00, 12.85it/s] Traceback (most recent call last):

File "", line 1, in assign1 = maup.assign(blocks20, vtds10)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/maup/crs.py", line 14, in wrapped return f(*args, **kwargs)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/maup/assign.py", line 12, in assign assignment = assign_by_covering(sources, targets)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/maup/assign.py", line 22, in assign_by_covering return indexed_sources.assign(targets)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/maup/indexed_geometries.py", line 42, in assign assignment = pandas.concat(groups).reindex(self.index)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/series.py", line 4579, in reindex return super().reindex(index=index, **kwargs)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 4810, in reindex axes, level, limit, tolerance, method, fill_value, copy

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 4834, in _reindex_axes allow_dups=False,

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 4880, in _reindex_with_indexers copy=copy,

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 663, in reindex_indexer self.axes[axis]._validate_can_reindex(indexer)

File "/Users/dpg/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3785, in _validate_can_reindex raise ValueError("cannot reindex from a duplicate axis")

ValueError: cannot reindex from a duplicate axis

InnovativeInventor commented 3 years ago

See #41. I think we should make the error message more useful, but this is likely a problem with your source or target geometries containing overlaps. If these are indeed Census blocks/vtds, then you likely have duplicates. Let me know if this helps!

dinosg commented 3 years ago

these ARE census blocks being mapped to VTD's. However the VTD's (for Texas, from the MGGG state archive for 2010) got 'buffered' to avoid a point defect that prevented a Graph getting made. possibly using the straight MGGG vtd archive for TX could be a workaround - we'll see.

Idea being to then map the 2010 census blocks to the 2020 census vtd's so I can have a database with the 2010 AND 2020 population stats all in 1 place so I can do interesting population change comparisons

InnovativeInventor commented 3 years ago

VEST already did this, I believe.

dinosg commented 3 years ago

you have a link for that repo? What I see at the general link https://dataverse.harvard.edu/file.xhtml?fileId=5007853&version=17.0 is stuff on 2020 election results but not obviously combining 2010 and 2020 demographics. Missing PA incidentally. I just got the comprehensive precinct results from PA sec'y of state - anyone I should send those to so they can integrate it with other datasets?

dinosg commented 3 years ago

there also is their repo with "crosswalks" https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/T9VMJO but not updated w/ 2020 census pop data, just the block shapes and 2019 ACS data.

InnovativeInventor commented 3 years ago

Sorry for the delay -- I thought that VEST had this data prepared, but maybe not.

InnovativeInventor commented 3 years ago

48 should silence the issue for you, I believe. Could you share the shapefiles that you're using? Also, did you make sure that your source and target shapefiles have the same projection?

brodiak9000 commented 3 years ago

I had this same issue. I checked the axes of my dataframes and no duplicates exist.

ValueError Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_3132/626298941.py in 1 variables = ["POP100"] 2 ----> 3 assignment = maup.assign(blocks, precincts) 4 precincts[variables] = blocks[variables].groupby(assignment).sum() 5 precincts[variables].head()

~\anaconda3\lib\site-packages\maup\crs.py in wrapped(*args, *kwargs) 12 ) 13 ) ---> 14 return f(args, **kwargs) 15 16 return wrapped

~\anaconda3\lib\site-packages\maup\assign.py in assign(sources, targets) 10 target that covers the most of its area. 11 """ ---> 12 assignment = assign_by_covering(sources, targets) 13 unassigned = sources[assignment.isna()] 14 assignments_by_area = assign_by_area(unassigned, targets)

~\anaconda3\lib\site-packages\maup\assign.py in assign_by_covering(sources, targets) 20 def assign_by_covering(sources, targets): 21 indexed_sources = IndexedGeometries(sources) ---> 22 return indexed_sources.assign(targets) 23 24

~\anaconda3\lib\site-packages\maup\indexed_geometries.py in assign(self, targets) 46 ) 47 ] ---> 48 assignment = pandas.concat(groups).reindex(self.index) 49 return assignment 50

~\anaconda3\lib\site-packages\pandas\core\series.py in reindex(self, index, kwargs) 4578 ) 4579 def reindex(self, index=None, kwargs): -> 4580 return super().reindex(index=index, **kwargs) 4581 4582 @deprecate_nonkeyword_arguments(version=None, allowed_args=["self", "labels"])

~\anaconda3\lib\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs) 4816 4817 # perform the reindex on the axes -> 4818 return self._reindex_axes( 4819 axes, level, limit, tolerance, method, fill_value, copy 4820 ).finalize(self, method="reindex")

~\anaconda3\lib\site-packages\pandas\core\generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy) 4837 4838 axis = self._get_axis_number(a) -> 4839 obj = obj._reindex_with_indexers( 4840 {axis: [new_index, indexer]}, 4841 fill_value=fill_value,

~\anaconda3\lib\site-packages\pandas\core\generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups) 4881 4882 # TODO: speed up on homogeneous DataFrame objects -> 4883 new_data = new_data.reindex_indexer( 4884 index, 4885 indexer,

~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy, consolidate, only_slice) 668 # some axes don't allow reindexing with dups 669 if not allow_dups: --> 670 self.axes[axis]._validate_can_reindex(indexer) 671 672 if axis >= self.ndim:

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in _validate_can_reindex(self, indexer) 3783 # trying to reindex on an axis with duplicates 3784 if not self._index_as_unique and len(indexer): -> 3785 raise ValueError("cannot reindex from a duplicate axis") 3786 3787 def reindex(

ValueError: cannot reindex from a duplicate axis

dinosg commented 3 years ago

the shapefiles I used were at: https://github.com/mggg-states/TX-shapefiles