josephmachado / beginner_de_project_stream

Simple stream processing pipeline
https://www.startdataengineering.com/post/data-engineering-project-for-beginners-stream-edition/
91 stars 27 forks source link

Error on M1 Mac when trying to run 'make run' #11

Open apenta opened 7 months ago

apenta commented 7 months ago

Hello, I am on the step where I run 'make run' and when I get to the point of having to running the more or less last step of the flink Dockerfile, RUN pip install --no-cache-dir -r /opt/flink/requirements.txt, I get this failure:

137.5   Installing build dependencies: started
144.3   Installing build dependencies: finished with status 'done'
144.4   Getting requirements to build wheel: started
144.4   Getting requirements to build wheel: finished with status 'error'
144.4   error: subprocess-exited-with-error
144.4
144.4   × Getting requirements to build wheel did not run successfully.
144.4   │ exit code: 255
144.4   ╰─> [1 lines of output]
144.4       Include folder should be at '/opt/java/openjdk/include' but doesn't exist. Please check you've installed the JDK properly.
144.4       [end of output]
144.4
144.4   note: This error originates from a subprocess, and is likely not a problem with pip.
144.4 error: subprocess-exited-with-error
144.4
144.4 × Getting requirements to build wheel did not run successfully.
144.4 │ exit code: 255
144.4 ╰─> See above for output.
144.4
144.4 note: This error originates from a subprocess, and is likely not a problem with pip.
------
failed to solve: process "/bin/sh -c pip install --no-cache-dir -r /opt/flink/requirements.txt" did not complete successfully: exit code: 1

Steps taken:

  1. Installed openjdk-11-jdk in numerous places, including the path above in the error.
  2. Tried to modify the JAVA_HOME path in the Dockerfile (still can't find the contents even if I cd into the path...)

I have an M1 mac. I am using Python 3.10. I am running this in iTerm2 using zshrc. Is there something I am missing? Also, I did have to use Docker desktop if that matters.

rizahmeds commented 7 months ago

I also got the same error. @josephmachado Do we have any workaround for M1 Mac?

josephmachado commented 7 months ago

hey folks, I don't have access to an M1 mac at this time, so unfortunately can't test. I run ubuntu and it seems to be working fine. Can you paste the full docker logs? It'll be docker logs jobmanager and docker logs taskmanager

ShubhamShaswat commented 7 months ago

The Issue lies with the line RUN pip install --no-cache-dir -r /opt/flink/requirements.txt in container/flink/dockerfile

It seems the apache-flink and the default openjdk library is not compatible with arm64 processors.

However one can't use docker logs jobmanager since docker image itself is failed to build. I was able to run container by avoiding last pip installation of libraries. After this, I executed the last line in the docker container terminal and reproduces the same error:

Collecting black==22.8.0
  Downloading black-22.8.0-py3-none-any.whl (159 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 159.8/159.8 KB 2.2 MB/s eta 0:00:00
Collecting flake8==5.0.4
  Downloading flake8-5.0.4-py2.py3-none-any.whl (61 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61.9/61.9 KB 72.0 MB/s eta 0:00:00
Collecting mypy==0.971
  Downloading mypy-0.971-py3-none-any.whl (2.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.6/2.6 MB 20.3 MB/s eta 0:00:00
Collecting isort==5.10.1
  Downloading isort-5.10.1-py3-none-any.whl (103 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 103.4/103.4 KB 312.4 MB/s eta 0:00:00
Collecting Jinja2==3.1.2
  Downloading Jinja2-3.1.2-py3-none-any.whl (133 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.1/133.1 KB 336.1 MB/s eta 0:00:00
Collecting apache-flink==1.17.0
  Downloading apache-flink-1.17.0.tar.gz (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 95.7 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting tomli>=1.1.0
  Downloading tomli-2.0.1-py3-none-any.whl (12 kB)
Collecting platformdirs>=2
  Downloading platformdirs-4.2.0-py3-none-any.whl (17 kB)
Collecting pathspec>=0.9.0
  Downloading pathspec-0.12.1-py3-none-any.whl (31 kB)
Collecting click>=8.0.0
  Downloading click-8.1.7-py3-none-any.whl (97 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 97.9/97.9 KB 12.4 MB/s eta 0:00:00
Collecting mypy-extensions>=0.4.3
  Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)
Collecting pycodestyle<2.10.0,>=2.9.0
  Downloading pycodestyle-2.9.1-py2.py3-none-any.whl (41 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 41.5/41.5 KB 167.7 MB/s eta 0:00:00
Collecting mccabe<0.8.0,>=0.7.0
  Downloading mccabe-0.7.0-py2.py3-none-any.whl (7.3 kB)
Collecting pyflakes<2.6.0,>=2.5.0
  Downloading pyflakes-2.5.0-py2.py3-none-any.whl (66 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 66.1/66.1 KB 211.3 MB/s eta 0:00:00
Collecting typing-extensions>=3.10
  Downloading typing_extensions-4.10.0-py3-none-any.whl (33 kB)
Collecting MarkupSafe>=2.0
  Downloading MarkupSafe-2.1.5-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (26 kB)
Collecting apache-beam==2.43.0
  Downloading apache_beam-2.43.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (13.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.9/13.9 MB 18.5 MB/s eta 0:00:00
Collecting apache-flink-libraries<1.17.1,>=1.17.0
  Downloading apache-flink-libraries-1.17.0.tar.gz (240.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 240.0/240.0 MB 13.0 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting avro-python3!=1.9.2,<1.10.0,>=1.8.1
  Downloading avro-python3-1.9.2.1.tar.gz (37 kB)
  Preparing metadata (setup.py) ... done
Collecting cloudpickle==2.2.0
  Downloading cloudpickle-2.2.0-py3-none-any.whl (25 kB)
Collecting fastavro<1.4.8,>=1.1.0
  Downloading fastavro-1.4.7.tar.gz (728 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 728.2/728.2 KB 27.7 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting httplib2<=0.20.4,>=0.19.0
  Downloading httplib2-0.20.4-py3-none-any.whl (96 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96.6/96.6 KB 451.7 MB/s eta 0:00:00
Collecting numpy<1.22.0,>=1.21.4
  Downloading numpy-1.21.6-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (13.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.0/13.0 MB 13.0 MB/s eta 0:00:00
Collecting pandas<1.4.0,>=1.3.0
  Downloading pandas-1.3.5-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (10.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.9/10.9 MB 14.7 MB/s eta 0:00:00
Collecting protobuf<=3.21,>=3.19.0
  Downloading protobuf-3.20.3-cp310-cp310-manylinux2014_aarch64.whl (918 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 918.4/918.4 KB 19.5 MB/s eta 0:00:00
Collecting py4j==0.10.9.7
  Downloading py4j-0.10.9.7-py2.py3-none-any.whl (200 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 200.5/200.5 KB 60.2 MB/s eta 0:00:00
Collecting pyarrow<9.0.0,>=5.0.0
  Downloading pyarrow-8.0.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (27.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27.2/27.2 MB 13.0 MB/s eta 0:00:00
Collecting python-dateutil<3,>=2.8.0
  Downloading python_dateutil-2.9.0.post0-py2.py3-none-any.whl (229 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 229.9/229.9 KB 19.2 MB/s eta 0:00:00
Collecting pytz>=2018.3
  Downloading pytz-2024.1-py2.py3-none-any.whl (505 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 505.5/505.5 KB 43.6 MB/s eta 0:00:00
Collecting requests>=2.26.0
  Downloading requests-2.31.0-py3-none-any.whl (62 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.6/62.6 KB 450.8 MB/s eta 0:00:00
Collecting pemja==0.3.0
  Downloading pemja-0.3.0.tar.gz (48 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.5/48.5 KB 320.3 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 255
  ╰─> [1 lines of output]
      Include folder should be at '/opt/java/openjdk/include' but doesn't exist. Please check you've installed the JDK properly.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 255
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Workaround that I found was this:

 FROM flink:1.17.0    

# install python3 and pip3
RUN apt-get update -y && \
    apt-get install -y python3 python3-pip python3-dev \
    openjdk-11-jdk-headless \
    && rm -rf /var/lib/apt/lists/* 

RUN ln -s /usr/bin/python3 /usr/bin/python

RUN wget https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-kafka/1.17.0/flink-sql-connector-kafka-1.17.0.jar && wget https://repo.maven.apache.org/maven2/org/apache/flink/flink-connector-jdbc/3.0.0-1.16/flink-connector-jdbc-3.0.0-1.16.jar && wget https://jdbc.postgresql.org/download/postgresql-42.6.0.jar

RUN echo "metrics.reporters: prom" >> "$FLINK_HOME/conf/flink-conf.yaml"; \
    echo "metrics.reporter.prom.factory.class: org.apache.flink.metrics.prometheus.PrometheusReporterFactory" >> "$FLINK_HOME/conf/flink-conf.yaml"

# Installing OpenJDK again & setting this is required due to a bug with M1 Macs
ENV JAVA_HOME=/usr/lib/jvm/java-11-openjdk-arm64

COPY requirements.txt /opt/flink/

# # Install py dependencies
RUN pip install --no-cache-dir -r /opt/flink/requirements.txt    

I went through few stackoverflow questions and finally I found this QnA relevant to this issue.

There were 2 changes in the dockerfile

  1. Installing openjdk-11-jdk-servless library
  2. Defining env JAVA_HOME to /usr/lib/jvm/java-11-openjdk-arm64

I am doing further research to find the root of the problems. At initial glance, it seems the openjdk having an issue with Arm64 architecture and required a specific version in order to work with apache-flink

ShubhamShaswat commented 7 months ago

once this issue is resolved. You would face error:

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) SCRAM authentication requires libpq version 10 or above

This can be easily fixed by updating psycopg2-binary==2.9.3 to psycopg2-binary==2.9.6 in container/datagen/Dockerfile

This issue again related to Arm64 architeture.

josephmachado commented 7 months ago

Thank you @ShubhamShaswat for the detailed resolution! I'll try this and check if build work on my local machine and push it if it does!