kuwala-io / kuwala

Kuwala is the no-code data platform for BI analysts and engineers enabling you to build powerful analytics workflows. We are set out to bring state-of-the-art data engineering tools you love, such as Airbyte, dbt, or Great Expectations together in one intuitive interface built with React Flow. In addition we provide third-party data into data science models and products with a focus on geospatial data. Currently, the following data connectors are available worldwide: a) High-resolution demographics data b) Point of Interests from Open Street Map c) Google Popular Times
https://kuwala.io
Apache License 2.0
788 stars 52 forks source link

population density downloading; attributes besides 'total' breaks when creating parquet files #145

Open wagnerfe opened 2 years ago

wagnerfe commented 2 years ago

hello hello, querying population density data (only tested for Germany) works fine, when choosing 'total' as category. However, when choosing a different category (f.e. 'women'), one can find the downloaded files as .csv in the tmp folder but the code it breaks when creating the parquet files. Error message:

Exception occurred during processing of request from ('127.0.0.1', 33952)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/socketserver.py", line 316, in _handle_request_noblock
    self.process_request(request, client_address)
  File "/usr/local/lib/python3.10/socketserver.py", line 347, in process_request
    self.finish_request(request, client_address)
  File "/usr/local/lib/python3.10/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/local/lib/python3.10/socketserver.py", line 747, in __init__
    self.handle()
  File "/usr/local/lib/python3.10/site-packages/pyspark/accumulators.py", line 262, in handle
    poll(accum_updates)
  File "/usr/local/lib/python3.10/site-packages/pyspark/accumulators.py", line 235, in poll
    if func():
  File "/usr/local/lib/python3.10/site-packages/pyspark/accumulators.py", line 239, in accum_updates
    num_updates = read_int(self.rfile)
  File "/usr/local/lib/python3.10/site-packages/pyspark/serializers.py", line 564, in read_int
    raise EOFError
EOFError
----------------------------------------
ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/py4j/clientserver.py", line 480, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/py4j/java_gateway.py", line 1038, in send_command
    response = connection.send_command(command)
  File "/usr/local/lib/python3.10/site-packages/py4j/clientserver.py", line 503, in send_command
    raise Py4JNetworkError(
py4j.protocol.Py4JNetworkError: Error while sending or receiving
Traceback (most recent call last):
  File "/opt/app/pipelines/population-density/src/main.py", line 21, in <module>
    Processor.start(files, output_dir, updated_date)
  File "/opt/app/pipelines/population-density/src/Processor.py", line 70, in start
    df.write.mode("overwrite").parquet(f"{output_dir}{updated_date}_result.parquet")
  File "/usr/local/lib/python3.10/site-packages/pyspark/sql/readwriter.py", line 885, in parquet
    self._jwrite.parquet(path)
  File "/usr/local/lib/python3.10/site-packages/py4j/java_gateway.py", line 1321, in __call__
    return_value = get_return_value(
  File "/usr/local/lib/python3.10/site-packages/pyspark/sql/utils.py", line 111, in deco
    return f(*a, **kw)
  File "/usr/local/lib/python3.10/site-packages/py4j/protocol.py", line 334, in get_return_value
    raise Py4JError(
py4j.protocol.Py4JError: An error occurred while calling o84.parquet
ERROR: 1

any idea on how I can fix this? (I work on macOS, Monterey, intel chip and only need the parquet files)

Thank you so much for any help and in general this really awesome project!