kedro-org / kedro-plugins

First-party plugins maintained by the Kedro team.
Apache License 2.0
87 stars 76 forks source link

Decide and implement how to include Datasets generated through dataset factories are not included in telemetry counts #566

Open DimedS opened 5 months ago

DimedS commented 5 months ago

Description

Currently, kedro-telemetry does not account for datasets generated through dataset factories. The existing code snippet used for counting datasets is as follows:

project_statistics_properties["number_of_datasets"] = sum(
    1 for c in catalog.list() if not c.startswith("parameters") and not c.startswith("params:")

This method overlooks datasets created via dataset factories. For further discussion, see here.

astrojuanlu commented 5 months ago

Opened a separate issue for packaged Kedro projects https://github.com/kedro-org/kedro-plugins/issues/567

noklam commented 4 months ago

The one who pick up the ticket should decide and implement which solutions work better. It was discussed that it's unclear how we use this information and it's not urgent until we introduced the opt-out flow.

Two alternatives:

astrojuanlu commented 4 months ago

Push telemetry to after_pipeline_run

Isn't it enough to do it at after_catalog_created?