xtensor-stack / xtensor

C++ tensors with broadcasting and lazy computing
BSD 3-Clause "New" or "Revised" License
3.34k stars 399 forks source link

[npy format parser] complicated dtypes #500

Open SylvainCorlay opened 6 years ago

SylvainCorlay commented 6 years ago

In the npy file format, the descr key contains

An object that can be passed as an argument to the numpy.dtype() constructor to create the array’s dtype.

which can potentially be anything, including a python object, or list comprehension syntax, if I understand correctly. Potentially, this may contain the } character, breaking the parsing of the top-level dictionary.

However, the alphabetical ordering of the top-level dictionary guarantees that this key is always first, so I guess that if we parse the header backwards, we can still read shape and fortran_order properly...

cc @wolfv @llohse

llohse commented 6 years ago

The parser of the newest version of libnpy was completely rewritten to be clearer and more robust. The new parser now only requires the 3 keywords 'descr', 'fortran_order', and 'shape' to be unique.

SylvainCorlay commented 6 years ago

@llohseThanks for letting us know about the re-write.

What do you mean by unique? That they may not be a 'shape' key somewhere under descr?

The issue with the current spec is that descr may contain anything with any sort of nesting of dicts, lists etc...