vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.31k stars 591 forks source link

[BUG-REPORT] import vaex causes a yaml parsing side effects through global config changes #1204

Closed sborgeson closed 3 years ago

sborgeson commented 3 years ago

Description

# start with fresh kernel
import yaml
yaml.safe_load( '''{ 'a' : 1, 'b' : 2 }''' ) # returns a dict 

but

# start with fresh kernel
import yaml
import vaex
yaml.safe_load( '''{ 'a' : 1, 'b' : 2 }''' ) # returns an OrderedDict

Importing your module should not have this side effect, changing the behavior of a core python module.

Note that this can lead to an unexpected exception this way:

# start with fresh kernel
import yaml
import vaex
obj = yaml.safe_load( '''{ 'a' : 1, 'b' : 2 }''' ) # returns an OrderedDict

# the output from safe_load is safe, right?
obj_str = yaml.dump(obj) 
print(obj_str) # not safe - specifies a OrderedDict class to instantiate, which is what safe_load doesn't allow.
yaml.safe_load(obj_str)  # exception - yaml wasn't safe.

# you have to call yaml.safe_dump to dump the OrderedDict back to a normal dict structure
# but safe_load shouldn't produce "unsafe" objects by default.

The StackOverflow discussion you linked to from your code warns against this global side effect and has 2 or 3 alternatives. https://stackoverflow.com/questions/5121931/in-python-how-can-you-load-yaml-mappings-as-ordereddicts

isvoboda commented 3 years ago

It seems it affects hydra

Importing vaex makes hydra crash with omegaconf.errors.ValidationError: Object of unsupported type: 'OrderedDict'.