Closed yetudada closed 5 months ago
From @jacobweiss2305:
Kedro-Airflow plugin version used (get it by running pip show kedro-airflow): 0.4.1 Airflow version (airflow --version): > 2.0.0 Kedro version used (pip show kedro or kedro -V): 0.17.7 Python version used (python -V): >3.9 Operating system and version: Ubuntu Linux 20.04
From @limdauto:
Hi @jacobweiss2305 please try python 3.8. Support for 3.9 hasn't been out yet.
From @jacobweiss2305:
Hi @limdauto
Support for Kedro and Python 3.9 is available using pip install kedro --ignore-requires-python (https://github.com/kedro-org/kedro/issues/710)
From @jweiss-ocurate:
Hi @limdauto
Here are the exact steps I am taking:
mkdir astro_cloud_kedro
cd astro_cloud_kedro
astrocloud dev init
python -m venv venv && source venv/bin/activate
pip install kedro --ignore-requires-python
pip install kedro-airflow --ignore-requires-python
kedro new --starter=spaceflights
cp -r new-kedro-project/* . && rm -rf new-kedro-project
pip install -r src/requirements.txt --ignore-requires-python
kedro package
FROM [quay.io/astronomer/astro-runtime:4.1.0](http://quay.io/astronomer/astro-runtime:4.1.0)
RUN pip install --user src/dist/new_kedro_project-0.1-py3-none-any.whl --ignore-requires-python
kedro airflow create --target-dir=dags/ --env=base
astrocloud dev start
*** Failed to verify remote log exists s3:///new-kedro-project/data-processing-preprocess-companies-node/2022-02-28T14:47:01.235178+00:00/1.log.
Please provide a bucket_name instead of "s3:///new-kedro-project/data-processing-preprocess-companies-node/2022-02-28T14:47:01.235178+00:00/1.log"
*** Falling back to local log
*** Reading local file: /usr/local/airflow/logs/new-kedro-project/data-processing-preprocess-companies-node/2022-02-28T14:47:01.235178+00:00/1.log
[2022-02-28, 15:17:11 UTC] {taskinstance.py:1037} INFO - Dependencies all met for <TaskInstance: new-kedro-project.data-processing-preprocess-companies-node scheduled__2022-02-28T14:47:01.235178+00:00 [queued]>
[2022-02-28, 15:17:12 UTC] {taskinstance.py:1037} INFO - Dependencies all met for <TaskInstance: new-kedro-project.data-processing-preprocess-companies-node scheduled__2022-02-28T14:47:01.235178+00:00 [queued]>
[2022-02-28, 15:17:12 UTC] {taskinstance.py:1243} INFO -
--------------------------------------------------------------------------------
[2022-02-28, 15:17:12 UTC] {taskinstance.py:1244} INFO - Starting attempt 1 of 2
[2022-02-28, 15:17:12 UTC] {taskinstance.py:1245} INFO -
--------------------------------------------------------------------------------
[2022-02-28, 15:17:12 UTC] {taskinstance.py:1264} INFO - Executing <Task(KedroOperator): data-processing-preprocess-companies-node> on 2022-02-28 14:47:01.235178+00:00
[2022-02-28, 15:17:12 UTC] {standard_task_runner.py:52} INFO - Started process 220 to run task
[2022-02-28, 15:17:12 UTC] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'new-kedro-project', 'data-processing-preprocess-companies-node', 'scheduled__2022-02-28T14:47:01.235178+00:00', '--job-id', '2', '--raw', '--subdir', 'DAGS_FOLDER/new_kedro_project_dag.py', '--cfg-path', '/tmp/tmpmr1pmxmb', '--error-file', '/tmp/tmpqzqs8xs8']
[2022-02-28, 15:17:12 UTC] {standard_task_runner.py:77} INFO - Job 2: Subtask data-processing-preprocess-companies-node
[2022-02-28, 15:17:12 UTC] {logging_mixin.py:109} INFO - Running <TaskInstance: new-kedro-project.data-processing-preprocess-companies-node scheduled__2022-02-28T14:47:01.235178+00:00 [running]> on host 3d8fc15ee46a
[2022-02-28, 15:17:12 UTC] {taskinstance.py:1429} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=new-kedro-project
AIRFLOW_CTX_TASK_ID=data-processing-preprocess-companies-node
AIRFLOW_CTX_EXECUTION_DATE=2022-02-28T14:47:01.235178+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-02-28T14:47:01.235178+00:00
[2022-02-28, 15:17:12 UTC] {store.py:32} INFO - `read()` not implemented for `BaseSessionStore`. Assuming empty store.
[2022-02-28, 15:17:12 UTC] {session.py:78} WARNING - Unable to git describe /usr/local/airflow
[2022-02-28, 15:17:12 UTC] {logging_mixin.py:109} WARNING - /home/astro/.local/lib/python3.9/site-packages/kedro/config/config.py:296 UserWarning: Duplicate environment detected! Skipping re-loading from configuration path: /usr/local/airflow/conf/base
[2022-02-28, 15:17:13 UTC] {local_task_job.py:154} INFO - Task exited with return code Negsignal.SIGKILL
[2022-02-28, 15:17:13 UTC] {taskinstance.py:1272} INFO - Marking task as UP_FOR_RETRY. dag_id=new-kedro-project, task_id=data-processing-preprocess-companies-node, execution_date=20220228T144701, start_date=20220228T151711, end_date=20220228T151713
[2022-02-28, 15:17:14 UTC] {local_task_job.py:264} INFO - 0 downstream tasks scheduled from follow-on schedule check
From @sunkickr:
@jweiss-ocurate this may be a memory issue based on the task logs showing Negsignal.SIGKILL. Could you try increasing the amount of local memory allocated to docker?
From @idanov:
@jweiss-ocurate I can confirm we could reproduce that. We'll try to debug what's causing it and update you with any findings we have here.
From @jweiss-ocurate:
Astronomer worked on this with me. The current docker image for Astronomer Cloud requires python 3.9. So I had to install kedro using --ignore-requires-python.
Astronomer was able to add a quick fix by reinstalling python 3.7 in the dockerfile.
From @noklam:
@jweiss-ocurate Does it works after downgrading the Python version?
From @jweiss-ocurate:
Yes it does.
I try to get it running with develop
but was not success.
astrocloud dev start
doesn't really allow volume mounting so I can't install a local copy of kedrogit
and even shipping the entire repo into the docker and installation seems to be blocked. (see error below)I wonder if there is anything special with astrocloud
or we could just test it with a custom Airflow setup to get rid of these restrictions.
I also notice it is using quay.io/astronomer/astro-runtime
instead of astronomer/ap-airflow
that is used in the documentation.
#13 0.247 + pip install kedro_develop
#13 0.589 Defaulting to user installation because normal site-packages is not writeable
#13 0.610 Looking in links: https://pip.astronomer.io/simple/astronomer-fab-security-manager/
#13 0.973 ERROR: Could not find a version that satisfies the requirement kedro_develop (from versions: none)
#13 0.973 ERROR: No matching distribution found for kedro_develop
#13 1.274 WARNING: You are using pip version 21.3.1; however, version 22.0.4 is available.
#13 1.274 You should consider upgrading via the '/usr/local/bin/python -m pip install --upgrade pip' command.
@jweiss-ocurate Could you share the latest Dockerfile that runs successfully?
After some investigation, this is the exact line causing the issue with logging.config.dictConfig(logging_config)
.
Testing with the latest image + Python 3.9 + Kedro==0.18.0. This is a workaround that would make it works.
Update thie line in logging.yml
"disable_existing_loggers": True
Dockerfile
FROM quay.io/astronomer/astro-runtime:4.2.1
RUN pip install --user dist/new_kedro_project-0.1-py3-none-any.whl --ignore-requires-python
A minimal example of KedroOperator.execute()
to reproduce the issue. It's not entirely clear what's the issue, but disable the existing logger to fix the crash. Potentially it is conflicting with Airflow's own logger. We will revisit the way Kedro does logging soon and hopefully will fix this issue together.
def execute(self, context):
print("Hello World")
config = {
"version": 1,
"disable_existing_loggers": False,
"formatters": {
"simple": {
"format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
},
},
"handlers": {
"console": {
"class": "logging.StreamHandler",
"level": "INFO",
"formatter": "simple",
"stream": "ext://sys.stdout",
},
},
# Try uncomment this line, it will fail
# "root": {
# "level": "INFO",
# "handlers": ["console", ],
# },
}
logging.config.dictConfig(
config
) # Comment out this line, everything will break
print("End of the Program")
The Airflow Astronomer and AstroCloud deployment documentation was updated in #3792. Due to issues with the Rich library logging in Airflow deployments, one of the updated steps advises setting Kedro logging to [console] only. Deployments are now successfully working with Astro and other cloud providers.
Raised by @jweiss-ocurate:
Description
I am trying to run a simple spaceflights example with Astrocloud. I wasn't sure if anyone has been able to get it to work.
Here is the DockerFile: FROM quay.io/astronomer/astro-runtime:4.1.0
RUN pip install --user new_kedro_project-0.1-py3-none-any.whl --ignore-requires-python
Context
I am trying to use kedro-airflow with astrocloud.
Steps to Reproduce
Expected Result
Complete Kedro Run on local Airflow image.
Actual Result
Failure in local Airflow image. [2022-02-26, 16:43:26 UTC] {store.py:32} INFO -
read()
not implemented forBaseSessionStore
. Assuming empty store. [2022-02-26, 16:43:26 UTC] {session.py:78} WARNING - Unable to git describe /usr/local/airflow [2022-02-26, 16:43:29 UTC] {local_task_job.py:154} INFO - Task exited with return code Negsignal.SIGKILLYour Environment
Include as many relevant details about the environment you experienced the bug in:
pip show kedro-airflow
): 0.4.1airflow --version
):pip show kedro
orkedro -V
): 0.17.7python -V
): > 2.0.0