v923z / micropython-ulab

a numpy-like fast vector module for micropython, circuitpython, and their derivatives
https://micropython-ulab.readthedocs.io/en/latest
MIT License
422 stars 116 forks source link

loadtxt not working? #561

Open NoRez4U opened 1 year ago

NoRez4U commented 1 year ago

I saved some files off using savetxt. When I try to load them back in I get a "ValueError: invalid syntax for number" error.

import os os.listdir() ['.openmv_disk', 'main.py', 'README.txt', 'System Volume Information', 'ulab.py', 'IntervalLogFile.csv', 'EventLogFile.csv', 'SurgeEvent-2021-1-1-9045.csv'] from ulab import numpy as np print(np.loadtxt('SurgeEvent-2021-1-1-9045.csv') ) Traceback (most recent call last): File "", line 1, in ValueError: invalid syntax for number import ulab print(ulab.version) 5.1.1-2D-c

Expected behavior That file was saved by this line of code: np.savetxt(Event_File_Name, Event_Array, delimiter=',', header='Milliseconds, Raw Analog Reading', footer=Event_Time_Stamp) My expectation is the file should load via loadtxt since it was saved off using savetxt.

The online documentation seems to support this expectation: https://micropython-ulab.readthedocs.io/en/latest/numpy-functions.html#loadtxt

To Reproduce I made a small array at the prompt to see if it would work and it did.

TestArray = np.array([1,2,3,4,5]) np.savetxt('TestData.csv', TestArray, header='col1;col2;col3;col4;col5', footer='Saved Data') os.listdir() ['.openmv_disk', 'main.py', 'README.txt', 'System Volume Information', 'ulab.py', 'IntervalLogFile.csv', 'EventLogFile.csv', 'SurgeEvent-2021-1-1-9045.csv', 'TestData.csv'] print(np.loadtxt('TestData.csv')) array([[1.0], [2.0], [3.0], [4.0], [5.0]], dtype=float32)

So maybe the array I've written off via the program is too large to read back in (It's a [2,1200] array of dtype=np.uint16). Or do I need to tell it the file had 'header', 'footer', and/or 'delimiter'? I'll be honest that I'd feel better submitting this as a bug if the TestArray didn't work, but it did. However, I'm still feeling like this is a bug and/or undocumented/unanticipated limitation.

I've attached one of the files saved off from the program in case that helps. SurgeEvent-2021-1-1-10226.csv

v923z commented 1 year ago

Thanks for raising the issue, I'll look into it.

v923z commented 1 year ago

@NoRez4U Does the function work, if you specify the delimiter?

np.loadtxt('TestData.csv', delimiter=',')

I think there is a bug in the code for the default case.

NoRez4U commented 1 year ago

No sir, adding a delimiter did not work. So I also saved it off via savetxt() and loaded it back with loadtxt() both with 'delimiter' and 'comments' flag to see if that was tripping it up, but no joy.

import os from ulab import numpy as np os.listdir() ['.openmv_disk', 'main.py', 'README.txt', 'System Volume Information', 'ulab.py', 'IntervalLogFile.csv', 'EventLogFile.csv', 'SurgeEvent-2021-1-1-9045.csv', 'TestData.csv', 'SurgeEvent-2021-1-1-10226.csv'] print(np.loadtxt('SurgeEvent-2021-1-1-9045.csv', delimiter=",")) Traceback (most recent call last): File "", line 1, in ValueError: invalid syntax for number print(np.loadtxt('SurgeEvent-2021-1-1-9045.csv', delimiter=',', comments='#')) Traceback (most recent call last): File "", line 1, in ValueError: invalid syntax for number

However, you brought up a good point so I also tested the 'savetxt()' w/o any flags (just the save_to_file and the array to write off). 'loadtxt()' then worked...

print(np.loadtxt('SurgeEvent-2021-1-2-125343.csv')) array([[52048.0, 52192.0, 52192.0, ..., 46360.0, 46408.0, 0.0], [9602.0, 8850.0, 9810.0, ..., 10018.0, 9154.0, 0.0]], dtype=float32)

I then tried to add each flag one at a time but it appeared that anything other than no flags errored out.

v923z commented 1 year ago

OK, thanks for the feedback! I'll try to fix this in the next couple of days.

However, since you say that the savetxt/loadtxt combination worked, if you saved your data without flags, I'm actually wondering, whether the issue is with savetxt.

NoRez4U commented 1 year ago

Maybe, but the savetxt() file looks correct when opened w/Excel.

I was trying to hunt down where the "ValueError: invalid syntax for number" was coming from. I looked in the io.c file but it's not from there (because that's where loadtxt()/savetxt() live). So then I started looking at io.c's dependencies. I think I found it in a MicroPython library, parsenum.c on line 421 https://github.com/micropython/micropython/blob/master/py/parsenum.c.

Maybe it's a MicroPython bug? I searched MicroPython "Issues" to see if they had a known bug w/py/parsenum but found nothing (you might want to double check me as I'm not exactly sure what some of their bugs are and how they might manifest themselves).

Unfortunately, that's about as high as I can punch (hope you're familiar w/the reference). I'm guessing that it might be that what py/parsenum is expecting vs. what it gets from io.c do not match? Maybe the flag information added by savetxt() needs to be stripped back out of the file before it's sent off for processing at py/parsenum but it's not (I don't understand everything that's going on in the loadtxt code)? But that one example I did where I created the array, saved it off via savetxt() w/flags and loaded it back via loadtxt() from the command prompt worked... so it shoots a hole in that theory. That said, the error (I think) is being thrown from py/parsenum (I didn't check the other libraries to see if they also had an error like that).

All that said, I'm in no rush for this fix. Just wanted to help improve a great tool since I'm not skilled enough to help in another way. I have a workaround for now (Excel). Whenever you get it fixed is great.

Thank you (all) for even making this available!