Closed twmacro closed 8 years ago
Added soft links. This takes care of multiple names referring to the same object and maintains those relationships upon load. Also allows for recursion. For example:
import numpy as np
import deepdish as dd
A = np.random.randn(3, 3)
d = dict(A=A, B=A) # two objects point to same matrix
d['C'] = d # add a recursive member
dd.io.save('test.h5', d)
d2 = dd.io.load('test.h5')
From inside IPython:
In [2]: d
Out[2]:
{'A': array([[-0.38067395, -0.00706444, 0.8047854 ],
[ 0.4510418 , 0.36732588, -0.54035383],
[-2.35721862, -0.49819387, 0.70847998]]),
'B': array([[-0.38067395, -0.00706444, 0.8047854 ],
[ 0.4510418 , 0.36732588, -0.54035383],
[-2.35721862, -0.49819387, 0.70847998]]),
'C': {...}}
In [3]: d2
Out[3]:
{'A': array([[-0.38067395, -0.00706444, 0.8047854 ],
[ 0.4510418 , 0.36732588, -0.54035383],
[-2.35721862, -0.49819387, 0.70847998]]),
'B': array([[-0.38067395, -0.00706444, 0.8047854 ],
[ 0.4510418 , 0.36732588, -0.54035383],
[-2.35721862, -0.49819387, 0.70847998]]),
'C': {...}}
In [4]: d2['A'] is d2['B']
Out[4]: True
In [5]: d2['C'] is d2
Out[5]: True
Oh, and the ddls output on that file is:
$ ddls test.h5
/ dict
/A link -> /B [SoftLink]
/B array (3, 3) [float64]
/C link -> / [SoftLink]
This looks great. Thanks for the PR!
Is there a connection between SampleNamespace and SoftLink? If not, I would rather see that dealt with in two separate PRs. I felt like I was getting ready to merge the SampleNamespace, but I need to give the SoftLink more thought, so it would be better if we could treat those two separately. What do you think?
That would be excellent! I was keeping these separate myself, but then I cleaned up ls.py a little in the new branch. Bad me! Let me know if I need to do anything ... this is my first PR. :-)
Here's my plan: remove the soft link stuff from the latest commit to just retain the final, cleaned up SimpleNamespace stuff. That should get this PR back in proper order I think. Then, after we're done with this one, I'll build the second PR based on the final results. Sound good?
Sounds good. You can always abandon this PR and open two brand new ones, if you find that easier. Then you could just cherry-pick the commits pertaining to each feature. I'm fine with either way though.
Okay, this one should be ready for your review. One thing to note is that I did not update any version settings (like the IO_VERSION).
Yes, you should definitely bump IO_VERSION
to 9.
In this update, the root /
is always displayed on a separate line. This makes sense to distinguish between dict
and SimpleNamespace
, but it does add an unnecessary line to all single-array data files, which is a very common use case.
How about leaving the output as is if SimpleNamespace is not used in the root, and simply add that line only if it's used. I'd love to hear your thoughts on this.
Also, in versions <3.3, silently falling back to dictionaries could be dangerous. If a SimpleNamespace is expected, the code will break if that's not returned. Although the fall-back certainly would be convenient during an interactive session if someone gives you a file with a SimpleNamespace and you are sitting on <3.3.
I think at least a warnings.warn
should be issued if a SampleNamespace:
-annotated group is encountered and _sns
is False
. Thoughts? I can add this after merging the PR too.
Excellent points; I concur on both counts. Are you planning to add both updates? Note that the io.rst file has that root line printed in the sample outputs.
I can do both points, it's up to you. Let me know if you want me to this myself post-merge.
Also, I might trim down the SimpleNamespace section of the docs too. Currently, I think this feature is pretty niched and given that it shares semantics with dictionaries, it doesn't need that much explanation.
Thanks for the PR by the way! Much appreciated.
That's fine with me. :) The README.rst file also has the extra line.
OK, in that case I'm merging this now. Thanks again!
Awesome! Thank you so much for such an excellent package. I thought I was going to have to write this myself. :-)
This PR adds native save/load of Python SimpleNamespace objects to deepdish. The implementation is such that older versions of Python (that do not support SimpleNamespace types) will seamlessly load them as dictionaries. The current version of deepdish supports the SimpleNamespaces via pickling, which has obvious downsides.
Our group uses dictionaries and SimpleNamespaces very often in our work: dictionaries when the keys are variables, SimpleNamespaces when the keys are fixed.
From "The Zen of Python": Namespaces are one honking great idea -- let's do more of those!