olofk / corescore

CoreScore
Apache License 2.0
134 stars 40 forks source link

Use msgpack directly, and feed it bytes from serial one at a time, to improve compatibility on OS X. #17

Closed jwise closed 3 years ago

jwise commented 3 years ago

Otherwise, we get a crash with something like:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/msgpack/fallback.py", line 121, in unpackb
    ret = unpacker._unpack()
  File "/usr/local/lib/python3.9/site-packages/msgpack/fallback.py", line 560, in _unpack
    typ, n, obj = self._read_header(execute)
  File "/usr/local/lib/python3.9/site-packages/msgpack/fallback.py", line 549, in _read_header
    self._reserve(4)
  File "/usr/local/lib/python3.9/site-packages/msgpack/fallback.py", line 324, in _reserve
    raise OutOfData
msgpack.exceptions.OutOfData

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/joshua/corescore/fusesoc_libraries/corescore/sw/corecount.py", line 50, in <module>
    curses.wrapper(main)
  File "/usr/local/Cellar/python@3.9/3.9.0_5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/curses/__init__.py", line 94, in wrapper
    return func(stdscr, *args, **kwds)
  File "/Users/joshua/corescore/fusesoc_libraries/corescore/sw/corecount.py", line 39, in main
    u = umsgpack.unpack(ser)
  File "/usr/local/lib/python3.9/site-packages/msgpack/__init__.py", line 58, in unpack
    return unpackb(data, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/msgpack/fallback.py", line 123, in unpackb
    raise UnpackValueError("Data is not enough.")
msgpack.exceptions.UnpackValueError: Data is not enough.
olofk commented 3 years ago

Thanks. There has always been confusion about umsgpack and msgpack. Maybe this will help in other cases too. I'll give it a spin here before merging

jwise commented 3 years ago

This still sometimes crashes if you start it up while the board is already spewing (it gets upset about the framing error). Someone else suggested a PR, it appears, that consumes unframed from the serial link before feeding them to msgpack, which might not be a bad idea to integrate with this, also.

tomverbeure commented 3 years ago

Dumb question: why does the core count tool exist in general?

I simply connect the terminal and capture the UART TX?

olofk commented 3 years ago

@tomverbeure The packetized messages is a remnant from CoreScore's roots in the heterogeneous sensor aggregation platform Observer where each collector fetched data that was madsaged and packetized in the base and fed to a common emitter that multiplexed the data before sending it out.

As you have noted, the msgpack-encoded messages can be read just fine as ASCII. But we could potentially add more metadata in the messages in the future.

It's also a great way to show off my amazing ASCII art skills and Corey

carlosedp commented 3 years ago

Got same problems here as well running on Windows. After applying the PR, corecount worked.

carlosedp commented 3 years ago

I still see some occasional errors, sometimes when resetting the board:

Traceback (most recent call last):
  File "Z:\projects\fusesoc\corescore\corecount.py", line 53, in <module>
    curses.wrapper(main)
  File "C:\Users\carlo\AppData\Local\Programs\Python\Python39\lib\curses\__init__.py", line 94, in wrapper
    return func(stdscr, *args, **kwds)
  File "Z:\projects\fusesoc\corescore\corecount.py", line 42, in main
    for o in unpacker:
  File "msgpack\_unpacker.pyx", line 518, in msgpack._cmsgpack.Unpacker.__next__
  File "msgpack\_unpacker.pyx", line 443, in msgpack._cmsgpack.Unpacker._unpack
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 19: invalid start byte

Also I synthesized 10 cores and it only found 9 :)

olofk commented 3 years ago

Thanks for the patch and the extra check. I guess two CoreScore users can't be wrong so I'm happy to take this one

@carlosedp I guess this one don't apply anymore, but it would probably help against the invalid start byte problem

carlosedp commented 3 years ago

Still breaking for me... with 250 cores:

PS Z:\projects\fusesoc\corescore> python corecount.py COM5
Traceback (most recent call last):
  File "Z:\projects\fusesoc\corescore\corecount.py", line 53, in <module>
    curses.wrapper(main)
  File "C:\Users\carlo\AppData\Local\Programs\Python\Python39\lib\curses\__init__.py", line 94, in wrapper
    return func(stdscr, *args, **kwds)
  File "Z:\projects\fusesoc\corescore\corecount.py", line 42, in main
    for o in unpacker:
  File "msgpack\_unpacker.pyx", line 518, in msgpack._cmsgpack.Unpacker.__next__
  File "msgpack\_unpacker.pyx", line 443, in msgpack._cmsgpack.Unpacker._unpack
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 21: invalid start byte
PS Z:\projects\fusesoc\corescore>

Sometimes it shows up and breaks counting... at 10-20 .. sometimes at 200 cores..

olofk commented 3 years ago

Hmm.. strange. I wonder if it has problems keeping up so that it drops characters. Could maybe be worse now when it reads one byte at a time. One experiment could be to just save incoming data to a file instead of decoding with msgpack and see if the stream looks correct

carlosedp commented 3 years ago

I've pulled latest changes but still see those issues... as a workaround, I've wrapped the main loop in a try/except block so it doesnt break-out:

...
            try:
                for o in unpacker:
                    if type(o) == str:
                        (y, x) = win.getyx()
                        win.addstr(curses.LINES//2-3, 6, o[0:-1])
                        win.refresh()
                        n = int(o[5:10])
                        if not (n in found_cores):
                            found_cores.append(n)
                            stdscr.addstr(
                                10, 35, "Found {} cores".format(len(found_cores)))
                            stdscr.refresh()
            except:
                pass

Weirdly it misses cores, like here I have 750 cores running but it counted 681. If I reset the board, it counts a different number... like 684...

image