kedro-org / kedro

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.
https://kedro.org
Apache License 2.0
10.03k stars 906 forks source link

.parquet not .pq #4253

Closed gdmcbain closed 1 month ago

gdmcbain commented 1 month ago

Description

As discussed on the Slack yesterday, the tutorial at the step Create a data processing pipeline writes Parquet as .pq which doesn't seem to be the standard suffix, which is .parquet; it's not recognized by, for example, Data Wranger.

I propose changing all uses of .pq to the standard .parquet.

Documentation page (if applicable)

Searching for .pq matches five files, two in tests and three in docs; e.g.:

https://github.com/kedro-org/kedro/blob/c2d7100a6bdf0dd51e80a2eade0b5d3f3a71184b/docs/source/tutorial/create_a_pipeline.md?plain=1#L203

deepyaman commented 1 month ago

@gdmcbain Do you want to open a PR to address this? I agree that .pq is nonstandard (or at least .parquet is much more common) and see no reason not to make this docs change.

gdmcbain commented 1 month ago

Yes, just running the test suite locally now. Ta.

SajidAlamQB commented 1 month ago

Completed in: https://github.com/kedro-org/kedro/pull/4254