makepath / census-parquet

Python tools for creating Parquet files from 2020 Census Data
MIT License
16 stars 4 forks source link

run_census_parquet fails at stage 5 #6

Closed dylanrstewart closed 2 years ago

dylanrstewart commented 2 years ago

at 99% completed of "finalizing geo files" the following error occurs:

Traceback (most recent call last): File "/usr/local/lib/python3.9/site-packages/dask/dataframe/io/parquet/arrow.py", line 56, in _append_row_groups metadata.append_row_groups(md) File "pyarrow/_parquet.pyx", line 628, in pyarrow._parquet.FileMetaData.append_row_groups RuntimeError: AppendRowGroups requires equal schemas.

dylanrstewart commented 2 years ago

Unfortunately, this seems to be a current Dask issue that has not been solved yet. #https://github.com/geopandas/dask-geopandas/issues/137

ianthomas23 commented 2 years ago

@drstewart19 Can you post the output of pip list or conda list so we can see what versions of the dependent libraries you are using? Thanks.

dylanrstewart commented 2 years ago

@ianthomas23 I have attached the list of packages package-list.txt

ianthomas23 commented 2 years ago

I can reproduce the problem on Linux with these packages:

attrs            21.4.0
census-parquet   0.0.8       /home/iant/github/census-parquet
certifi          2021.10.8
click            8.1.2
click-plugins    1.1.1
cligj            0.7.2
cloudpickle      2.0.0
dask             2022.4.0
dask-geopandas   0.1.0
distributed      2022.4.0
et-xmlfile       1.1.0
Fiona            1.8.21
fsspec           2022.3.0
geopandas        0.10.2
HeapDict         1.0.1
Jinja2           3.1.1
locket           0.2.1
MarkupSafe       2.1.1
msgpack          1.0.3
munch            2.5.0
numpy            1.22.3
openpyxl         3.0.9
packaging        21.3
pandas           1.4.2
partd            1.2.0
pip              20.3.4
pkg-resources    0.0.0
psutil           5.9.0
pyarrow          7.0.0
pygeos           0.12.0
pyparsing        3.0.7
pyproj           3.3.0
python-dateutil  2.8.2
pytz             2022.1
PyYAML           6.0
setuptools       44.1.1
Shapely          1.8.1.post1
six              1.16.0
sortedcontainers 2.4.0
tblib            1.7.0
toolz            0.11.2
tornado          6.1
urllib3          1.26.9
zict             2.1.0