larray-project / larray

N-dimensional labelled arrays in Python
https://larray.readthedocs.io/
GNU General Public License v3.0
8 stars 6 forks source link

Better support for reading files without using any column as index, or any row as column names #1102

Open gdementen opened 3 months ago

gdementen commented 3 months ago

i.e. use synthetic 0-N labels in anonymous axes.

This is necessary to support database-like files and this is the default behaviour in Pandas.

First create our test case:

>>> from pathlib import Path
>>> arr = ndtest('a=a0,a1;column=b,c')
>>> arr.to_csv('test.csv', dialect='classic')
>>> print(Path('test.csv').read_text())
a,b,c
a0,0,1
a1,2,3
>>> arr.to_excel('test.xlsx')

I want to be able achieve this:

>>> Array([['a0', 0, 1],
...        ['a1', 2, 3]], axes=(2, 'a,b,c'))
{0}*\{1}   a  b  c
       0  a0  0  1
       1  a1  2  3

And this mostly works for CSV files (but needs to be documented and unit-tested if it's not already)

>>> read_csv('test.csv', nb_axes=0)
{0}\{1}   a  b  c
      0  a0  0  1
      1  a1  2  3
>>> # but not for read_excel...
>>> read_excel('classic.xlsx', nb_axes=0)
ValueError: Must pass non-zero number of levels/codes
>>> # ... unless we use engine='openpyxl'
>>> read_excel('classic.xlsx', engine='openpyxl', nb_axes=0)
{0}\{1}  a\column  b  c
      0        a0  0  1
      1        a1  2  3

I would also like to be able to achieve this:

>>> Array([[ 'a', 'b', 'c'],
...        ['a0',   0,   1],
...        ['a1',   2,   3]])
{0}*\{1}*   0  1  2
        0   a  b  c
        1  a0  0  1
        2  a1  2  3
>>> # it is possible using open_excel
>>> with open_excel('test.xlsx') as wb:
...     print(wb[0][:])
{0}*\{1}*         0  1  2
        0  a\column  b  c
        1        a0  0  1
        2        a1  2  3      
>>> # but I don't think it is currently possible using either read_csv. It might be possible using read_excel(engine='openpyxl') by using undocumented (in larray) pandas arguments.

My old fix_nb_axes_eq_1 branch fixes a related problem and discuss a bit about this.

I think in the end we will need arguments to specify number of (or indices of) vertical axes and horizontal axes AND support those arguments to be 0.