kedro-org / kedro-plugins

First-party plugins maintained by the Kedro team.
Apache License 2.0
91 stars 83 forks source link

`kedro airflow create` produces very long task ids when using unnamed nodes #397

Closed astrojuanlu closed 7 months ago

astrojuanlu commented 11 months ago

Description

As per title.

Context

TBC

Steps to Reproduce

# conf/airflow/catalog.yml
active_modelling_pipeline.regressor:
  filepath: data/06_models/regressor_active.pkl
  type: pickle.PickleDataSet
  versioned: true
candidate_modelling_pipeline.regressor:
  filepath: data/06_models/regressor_candidate.pkl
  type: pickle.PickleDataSet
  versioned: true
companies:
  filepath: data/01_raw/companies.csv
  type: pandas.CSVDataSet
model_input_table:
  filepath: data/03_primary/model_input_table.pq
  type: pandas.ParquetDataSet
preprocessed_companies:
  filepath: data/02_intermediate/preprocessed_companies.pq
  type: pandas.ParquetDataSet
preprocessed_shuttles:
  filepath: data/02_intermediate/preprocessed_shuttles.pq
  type: pandas.ParquetDataSet
reviews:
  filepath: data/01_raw/reviews.csv
  type: pandas.CSVDataSet
shuttles:
  filepath: data/01_raw/shuttles.xlsx
  load_args:
    engine: openpyxl
  type: pandas.ExcelDataSet

Then $ kedro airflow create --target-dir=dags/ --env=airflow produces tasks like these:

...
        "active-modelling-pipeline-evaluate-model-active-modelling-pipeline-regressor-active-modelling-pipeline-x-test-active-modelling-pipeline-y-test-none": KedroOperator(
            task_id="active-modelling-pipeline-evaluate-model-active-modelling-pipeline-regressor-active-modelling-pipeline-x-test-active-modelling-pipeline-y-test-none",
            package_name=package_name,
            pipeline_name=pipeline_name,
            node_name="active_modelling_pipeline.evaluate_model([active_modelling_pipeline.regressor,active_modelling_pipeline.X_test,active_modelling_pipeline.y_test]) -> None",
            project_path=project_path,
            env=env,
        ),
...

Than then cannot be imported into Airflow:

filepath                                          | error                                                              
==================================================+====================================================================
/Users/juan_cano/airflow/dags/spaceflights_dag.py | Traceback (most recent call last):                                 
                                                  |   File                                                             
                                                  | "/Users/juan_cano/.micromamba/envs/airflow310/lib/python3.10/site-p
                                                  | ackages/airflow/models/baseoperator.py", line 805, in __init__     
                                                  |     validate_key(task_id)                                          
                                                  |   File                                                             
                                                  | "/Users/juan_cano/.micromamba/envs/airflow310/lib/python3.10/site-p
                                                  | ackages/airflow/utils/helpers.py", line 55, in validate_key        
                                                  |     raise AirflowException(f"The key has to be less than           
                                                  | {max_length} characters")                                          
                                                  | airflow.exceptions.AirflowException: The key has to be less than   
                                                  | 250 characters                                                     
                                                  |       

Your Environment

(TBC)

sbrugman commented 9 months ago

Note that this is only the case when the nodes have no explicit name, and node.name defaults to the signature. Either way, annoying behaviour, but at least specifying the name is a workable solution.

astrojuanlu commented 7 months ago

Interesting, thanks a lot. Opened an issue to track that https://github.com/kedro-org/kedro/issues/3575

I guess this is a feature rather than a bug then. I'm closing.