Open DimedS opened 1 year ago
This is one interesting pattern I saw with Copier:
/tmp/test-copier ······································································· 13s 13:54:0
❯ echo "Hello" > README.md (kedro310)
/tmp/test-copier ············································································· 13:54:1
❯ ls (kedro310)
README.md
/tmp/test-copier ············································································· 13:54:1
❯ copier copy gh:astrojuanlu/copier-pylib . (kedro310)
No git tags found in template; using HEAD as ref
🎤 Your account or organization
astrojuanlu
🎤 The name of the project (as in `pip install {project_name}`
test_project
🎤 The name of the package (as in `import {package_name}`)
test_project
🎤 A short description of the extension
Test project
🎤 Your name
Juan Luis
🎤 Your email
juan_luis_cano@mckinsey.com
Copying from template version 0.0.0.post44.dev0+0458d66
identical .
create .flake8
create pyproject.toml
create CODE_OF_CONDUCT.md
create .pre-commit-config.yaml
create tests
create tests/test_import.py
create LICENSE
create docs
create docs/Makefile
create docs/source
create docs/source/conf.py
create docs/source/index.md
create docs/source/_static
create docs/source/_static/.gitkeep
create .gitignore
create .github
create .github/workflows
create .github/workflows/test.yml
create .github/workflows/rtd-preview.yml
create .github/workflows/publish.yml
create CONTRIBUTING.md
create tox.ini
create .readthedocs.yaml
conflict README.md
Overwrite README.md? [Y/n] n
create src
create src/test_project
create src/test_project/__init__.py
create src/test_project/py.typed
/tmp/test-copier ······································································· 29s 13:54:5
❯ ls (kedro310)
CODE_OF_CONDUCT.md LICENSE docs/ src/ tox.ini
CONTRIBUTING.md README.md pyproject.toml tests/
/tmp/test-copier ············································································· 13:55:1
❯ cat README.md (kedro310)
Hello
I think overwriting existing files should be done with care, there might be something the user puts in there for some reason.
We should check other Kedro's CLI to see if there are any other commands overrides existing files. Should there be a confirmation steps or warning generated.
We should check other Kedro's CLI to see if there are any other commands overrides existing files. Should there be a confirmation steps or warning generated.
I checked the kedro pipeline create
command. It's wrapped in a try-catch block and fails when attempting to create a duplicate pipeline. However, for the kedro catalog create
command, it's acceptable if the catalog was previously created because the command is designed for that scenario. You should use it to identify any missed datasets that weren't previously described in the catalog, so they will be mentioned as MemoryDatasets for further processing. Therefore, I believe it's okay to update the catalog file, but comments will be missing as mentioned in the current issue.
Description
When we use the
kedro catalog create
command on existing YAML file, we're losing comments from the old file.Context
As of now, when we use the
kedro catalog create
command, it overwrites the existing file. However, we save old data(but lose comments) using the following approach:Possible Implementations
We could use the ruamel.yaml library instead of PyYAML. It preserves comments with the same code structure. The downside is that it's not as popular as PyYAML. How does this option sound to us?
We could use yaml.safe_dump in append mode, simply adding new catalog data after the end of the old file. However, this may lead to issues if the previous file was corrupted as we'd just be extending that corrupted file.