Open bjornvandijkman-ingka opened 2 years ago
This is because of the default behaviour of PyYAML which overwrites the data. Can be resolved by writing Custom Loader. Here is an example:
# special loader with duplicate key checking
class UniqueKeyLoader(yaml.SafeLoader):
def construct_mapping(self, node, deep=False):
mapping = []
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=deep)
assert key not in mapping, f"Duplicate key in Yaml File: {key}"
mapping.append(key)
return super().construct_mapping(node, deep)
And then we can use this by calling:
filename='soda.yml'
yaml_text = open(filename, 'r').read()
data = yaml.load(yaml_text, Loader=UniqueKeyLoader)
The error looks like this:
AssertionError: Duplicate key in Yaml File: table_name
The error could be customized as we wish to.
@anilkulkarni87 that's very nice approach - would you like to open a PR with this?
@vijaykiran Yes I will work on it.
@vijaykiran I have created Draft pull request: https://github.com/sodadata/soda-sql/pull/624 I do have some questions though. Please take a look.
I will also have to make a change at : https://github.com/sodadata/soda-sql/blob/8a75b53902615d2724ed17c6560d4ec936dc449a/core/sodasql/scan/parser.py#L75
For me it seemed intuitive to include tests for multiple tables in one scan.yml file as follows:
Doing this results in only the customers table being used, while ignoring the orders table. I think it would be nice to have support for testing multiple tables in one file, but before such functionality is implemented it would be user-friendly if a warning/error was thrown that soda currently cannot handle testing multiple tables in one file.