simple-framework / simple_grid_yaml_compiler

Generates the extended YAML output for an input site_level_configuration_file
Apache License 2.0
0 stars 9 forks source link

test functions in lexemes.py #25

Open maany opened 5 years ago

kevalnagda commented 5 years ago

Could you please provide any description for the issue. Thanks!

maany commented 5 years ago

Hey, take a look at Pg 12 and Pg 20 in the specification https://docs.google.com/document/d/1yp_96UXcwNO49cktnHtT61iNmTO0RgrSQukuNYqACpM/

Take a look at: https://github.com/WLCG-Lightweight-Sites/simple_grid_yaml_compiler/tree/master/compiler

It defines two functions

get_repo_list parses the site_level_config_file line by line and extracts some_url from 'repository_url: some_url' wherever it appears. We need to write a test to validate different scenarios (types of url's) that can be successfully extracted by this function. The tests reside in :https://github.com/WLCG-Lightweight-Sites/simple_grid_yaml_compiler/blob/master/tests/tests_lexemes.py Feel free to delete the existing test case as it is outdated. You'd have to install pytest etc. in your virtual environment and include them in the requirements.txt file when you make a PR.

The second function, parse_for_variable_hierarchies, process the from keyword (mentioned in the specification). It looks through the site_level_config_file recursively for instances where the from keyword appears. It generally appears in the following format

var2: &var2 value_2
var1: &var1
 __from__: *var2

Take a look at: https://github.com/WLCG-Lightweight-Sites/wlcg_lightweight_site_ce_cream/blob/master/default-data.yaml

The function should reduce the output to

var1: value2

We gotta write a unit test to check this functionality in the same test_lexemes.py file mentioned above. Let me know if more info is needed :) .

maany commented 5 years ago

The from keyword, by definition, has to be the only child of a Dict/CommentedMap. That's because the assignment operation is one to one mapping rather than a one to many mapping. Therefore, if a CommentedSeq contains a from keyword, the case should either not be processed or it should throw an exception. In the other case i.e. if the user specified a from keyword in a CommentedSeq, the data = data[keyword] line will throw an exception, except for the case that the from referred to a variable that represented a number and the ConnectedSeq had an entry corresponding to that index.

Regarding the redundancy, I believe it would exist if we parse through the data more than once during the execution of the function. However, when we check for the keyword in 'if keyword in data:', there is either an exception thrown as described above or the next recursive iteration starts. If the keyword is not present in data, we start the next recursive iteration for all of the child objects. (that's what the being handled by the for loops that follow later). Sure, you could move the if statement inside the loop for CommentedMap and check if the child of the CommentedMap has a key == keyword for every child. Assuming, python's implementation for searching a key in a dict is efficient already, it must be safe to stay away from adding a search step for every child object in the loops towards the end of the function. I could be wrong here as I have not read about how python core has implemented the search. Note that we are not dealing with data big enough to generate the need to investigate optimization all of the recursive aspects. But if you see a significant advantage in an alternate approach, I'd be super interested to hear. :) Could you also give an example for the same, if possible :)

On Sat, Mar 30, 2019 at 8:26 AM Vighnesh SK notifications@github.com wrote:

The code block https://github.com/WLCG-Lightweight-Sites/simple_grid_yaml_compiler/blob/5bdab27f9bc806fbfc9e1795bc375458e3728f85/compiler/lexemes.py#L23

if keyword in data: data = data[keyword] return parse_for_variable_hierarchies(data, keyword)

Isn't this redundant? Because the first if statement doesn't allow any datatype other than CommentedSeq and CommentedMap and there is specific code blocks that handles either of them. This code block at line 23, assumes the data's datatype to be a dict which isn't in case of CommentedSeq.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/WLCG-Lightweight-Sites/simple_grid_yaml_compiler/issues/25#issuecomment-478216314, or mute the thread https://github.com/notifications/unsubscribe-auth/AFofQhW0VOgAARMbkyMTWnnwO5SKkAErks5vbxG1gaJpZM4brVQp .

-- Regards Mayank Sharma

Boot-Error commented 5 years ago

Thanks for the explanation, I had trouble figuring this out because I wasn't aware of the improbable edge case of of CommentedSeq that has the keyword to match in it.

The from keyword, by definition, has to be the only child of a Dict/CommentedMap. That's because the assignment operation is one to one mapping rather than a one to many mapping. Therefore, if a CommentedSeq contains a from keyword, the case should either not be processed or it should throw an exception. In the other case i.e. if the user specified a from keyword in a CommentedSeq, the data = data[keyword] line will throw an exception, except for the case that the from referred to a variable that represented a number and the ConnectedSeq had an entry corresponding to that index.

This surely confirms the test condition I assumed in #34