Closed ba1dr closed 1 year ago
@ba1dr Thanks for your advice!
Yes, it's nice and natural to parse the relative file from where the current YAML is.
But i can not to get information about current file, as you wrote.
For the case of that "config files in different directories.", i think current API can't do that beautifully, but to use absolute path is workable.
jinjyaml is a Jinja2 template engine integration for PyYAML.
We can include files by Jina2's include
instruction:
Consider we have below YAML:
parent: !j2 |
{% include "child-1.yml" %}
{% include "child-2.yml" %}
then execute:
import jinja2
import jinjyaml
j2_env = jinja2.Environment(
loader = jinja2.FileSystemLoader(searchpath=your_base_dir)
)
j2_ctor = jinjyaml.Constructor()
yaml.add_constructor('!j2', j2_ctor)
doc = yaml.full_load(yaml_string)
data = jinjyaml.extract(doc, env=j2_env)
Jinja2's FileSystemLoader
would load child-1.yml and child-2.yml, relative to it's search path.
And we can even write a custom Jinja2 file loader, for particular purpose.
Hmm, no, I think Jinja2 would be an overkill. If using template engines - I'd better use config file on Python generated with Jinja2 rather than yaml. Even with this include feature I am not sure if it is a good idea to use it as it breaks compatibility with other languages or scripts that do not support this tag..
Perhaps a YMAL's Json Pointer (if there be one) could be more fit for the case.
In case anyone's interested, my current workaround is this:
import contextlib
import os
import pathlib
import yaml
from yamlinclude import YamlIncludeConstructor
YamlIncludeConstructor.add_to_loader_class(loader_class=yaml.SafeLoader)
@contextlib.contextmanager
def working_directory(path: pathlib.Path):
prev_cwd = pathlib.Path.cwd()
os.chdir(path)
try:
yield
finally:
os.chdir(prev_cwd)
def load_config_file(file_path: pathlib.Path):
with working_directory(file_path.parent):
with file_path.open("r") as config_file:
return yaml.safe_load(config_file)
But obviously, the limitation is that any 2nd+ level include is relative to the first file, not any intermediate files, but luckily that's good enough for us right now :slightly_smiling_face:
But i can not to get information about current file, as you wrote.
Actually @tanbro, you can! :)
One just had to change the base_dir
as they travel along, and extract the name of the file from the stream, then patch yaml.load
specifically.
EDIT: Updated the snippet, this is what we now use internally in an __init__.py
.
import yaml
from yamlinclude import YamlIncludeConstructor
YamlIncludeConstructor.add_to_loader_class(loader_class=yaml.FullLoader)
YamlIncludeConstructor.add_to_loader_class(loader_class=yaml.SafeLoader)
YamlIncludeConstructor.add_to_loader_class(loader_class=yaml.Loader)
YamlIncludeConstructor.add_to_loader_class(loader_class=yaml.BaseLoader)
include_tag = YamlIncludeConstructor.DEFAULT_TAG_NAME
yaml_load = yaml.load # Save original load function
def load_yaml(stream, Loader):
from pathlib import Path
path = Path(stream.name)
if include_tag not in Loader.yaml_constructors:
return yaml_load(stream, Loader=Loader)
previous_base = Loader.yaml_constructors[include_tag].base_dir
Loader.yaml_constructors[include_tag].base_dir = path.parent.as_posix()
res = yaml_load(stream, Loader=Loader)
Loader.yaml_constructors[include_tag].base_dir = previous_base
return res
yaml.load = load_yaml # Use new one
del YamlIncludeConstructor
del yaml
The above would fail on strings (if used with e.g. yaml.load(f.read())
, or some local definitions).
One can add an isinstance(stream, io.TextIOWrapper)
for validation as needed.
EDIT: Like so:
yaml_load = yaml.load # Save original load function
def load_yaml(stream, Loader):
from pathlib import Path
from yamlinclude import YamlIncludeConstructor
from io import TextIOWrapper
tag = YamlIncludeConstructor.DEFAULT_TAG_NAME
if tag not in Loader.yaml_constructors or not isinstance(stream, TextIOWrapper):
# If tag is included in the stream but we can't get the file location, we can't assume
# anything about the relative file location
return yaml_load(stream, Loader=Loader)
path = Path(stream.name)
previous_base = Loader.yaml_constructors[tag].base_dir
Loader.yaml_constructors[tag].base_dir = path.parent.as_posix()
res = yaml_load(stream, Loader=Loader)
Loader.yaml_constructors[tag].base_dir = previous_base
return res
yaml.load = load_yaml # Use new one
@ba1dr @1ace You might be interested ^
I think (IMHO) it would make more sense if relative include searches for the file in the current file's directory rather than current working directory.
Use case: when I pass config files in the command line I might have them in different directories. And if they're independent - they could load their own local extensions.
The solution to not break backward compatibility would be, for example to add a new parameter to the constructor or to initialize
base_dir=None
instead of default empty string.EDIT: perhaps this is not possible to get information about current file from the node object passed to the constructor? But as soon as we can redefine reader - I see that
yaml.Reader
class can handle stream names for file-like objects.