Testing the interface between tiny/v0.1 and eembc

videetparekh commented 3 years ago

Hi, I want to test how the eembc benchmark runner interfaces with the tiny v0.1, and wanted to do that by deploying one of the reference submissions locally, on a desktop and testing it with the eembc runner. Is there a reference implementation for that somewhere?

So far I've only found examples that must be compiled with MBED and deployed to dedicated hardware. I'm also unaware if these examples are ready-to-go, or if they require additional work like creating/compiling a model a certain way before testing it.

colbybanbury commented 3 years ago

Hi,

The reference submissions should be 'ready to go' (with minor things still being checked in) as long as you follow the steps in the READMEs. The models used in the reference submissions have already been converted to TF lite micro format. You do need an mbed platform to run the reference submissions.

EEMBC's runner assumes a USB serial connection to the device under test so I'm not sure it would work with something running locally.

Is there something specific you wanted to test about the interface?

videetparekh commented 3 years ago

My team had several goals in mind:

Test a pretrained model locally to see timing and performance numbers and compare them to our own test stack.
Deploy a model to RPi (Cortex-A) and test

We're a little confused about the process to deploy to RPi that won't break the EEMBC interface. Are there any reference examples for an RPi/Cortex-A?

colbybanbury commented 3 years ago

Good goals!

I'd suggest:

The benchmark relies on a timer on the DUT to measure latency. It would probably be easiest to implement the C++ API to run on your model locally and time the latency without the EEMBC runner. The accuracy test sets used by the runner are available here so you can test the accuracy without the runner as well.
Deploying an RPI shouldn't be an issue as long as there is some UART connection with the host. As a side note, the benchmarks may be trivial for Cortex-A class systems.

honsontran commented 3 years ago

Hello @colbybanbury,

Thanks for the quick reply. With regards to the EEMBC runner, I followed the readme to setup only the performance runner (no energy hardware implemented). The one portion of this segment of the README that was temporarily skipped is the compilation for our submission. For the time being, we're getting familiar with how everything pieces together :)

This is the segment that was skipped:

Compile as EE_CFG_ENERGY_MODE 0 (see the #define in monitor/th_api/th_config.h or api/submitter_provided.h). Program the th_timestamp function to return the current microseconds since boot time (e.g., with a MCU counter or system timer). My current setup right now is having a Pi 4 directly connected to my Windows (also tried this on Mac OSX) host machine using a USB to TTL adapter.

I'm also able to ssh via COM3 on Windows with Putty without any issues. When starting the EEMBC Runner, I am unable to get the device to list. COM3 is detected, but I keep getting 00001.271 framework: Serial COM3 failed name check, skipping.

00000.394 framework: Package version: 3.0.2
00000.395 framework: Mode: production
00000.395 framework: User home directory is C:\Users\honson
00000.395 framework: Session storage directory is C:\Users\honson\eembc\runner\sessions
00000.396 framework: Temporary storage directory is C:\Users\honson\eembc\runner\temp
00000.396 pnp: m-scan-start
00000.396 pnp: Looking for 'EEMBC Serial' devices...
00000.398 framework: Serial scan using 115200 baud
00001.271 framework: Serial COM3 failed name check, skipping
00001.271 pnp: Looking for 'Joulescope' devices...
00001.271 pnp: Looking for 'STMicroelectronics LPM01A' devices...
00001.272 pnp: m-scan-stop
00001.272 framework: m-ready

Additional, I have ensured that .eembc.ini has a matching baud rate and have tried to increase the time for detection of devices to 10s. Here is a copy of my .eembc.ini:

default-timeout-ms=5000
dut-baud=115200
dut-boot-mv=10000
emon-drop-thresh-pct=0.1
root=C:\Users\iFai1
timestamp-hold-us=50
umount-on-error=true
use-crlf=false
use-visa=false
n6705-set-vio=false

There's some references in the readme that suggest to also make sure the command name% returns back the device name, but I'm not sure if this is referring to the CLI in the EEMBC Runner, or the Putty Terminal. Any help would be great! We seem to be close.

honsontran commented 3 years ago

Additional note: I'm aware there is a similar post here. Just an assumption, is the EEMBC Runner searching for MBED devices that have the compiled .bin flashed onto the device? That would be the only reason I would expect the device not to detect. Is there currently a flow to support the Pi on the EEMBC Runner?

colbybanbury commented 3 years ago

Hi,

Yes, you're right! The runner looks to handshake with the device under test via the 'name' command. They don't have to be MBED devices though they just need to be running the benchmarking API.

We haven't tested the Pi ourselves but you should be able to port the benchmarking API to the Pi without too much trouble. Sorry for the lack of RasPi support, it's somewhat outside of the target device range. It would be nice to know if it works though.

petertorelli commented 3 years ago

You need to connect to the UART on the Pi, not the process listening to UART. Two very different things. The printf and getchar functions in the test harness must use the actual UART TX/RX, not the console. I believe this requires changing the boot config on the Pi to not set-up a terminal on the UART, and then writing that layer yourself with the C libraries to retarget stdout/stdin in GCC to that port.

honsontran commented 3 years ago

@colbybanbury @petertorelli. thanks for all the quick replies. They were really helpful in sifting through the codebase.

@petertorelli

You need to connect to the UART on the Pi, not the process listening to UART. Two very different things. The printf and getchar functions in the test harness must use the actual UART TX/RX, not the console. I believe this requires changing the boot config on the Pi to not set-up a terminal on the UART, and then writing that layer yourself with the C libraries to retarget stdout/stdin in GCC to that port.

Thanks for clarifying. I was able to disable setting up the terminal on the Pi and directly connect to TX/RX directly. Following the reference examples in the repository, all that is needed to establish the serial connection is to simply modify th_serialport_initialize()(ref). My plan is to mimic something similar to what is done for the serial initialization shown from the reference examples and to use something like this for the Pi.

@videetparekh and I had a couple of questions as well. Since the tinyML repo only has examples showing how to compile via MBed, we're working to find a way to compile the code to run under the OS for testing. We found that the ULPMark Repo seems to be a lot easier to start off with, as there's already a cmake file with a README clarifying the implementation between the EEMBC Runner and testing harness.

To our knowledge, the tinyml repo has adopted this implementation and has simplified it. Our questions are:

Implementation:

The ULPMark test harness seems a bit more straightforward to work with. Is it possible to use this test harness to submit to tinyML?
The ULPMark test harness also includes a way to run in a self-hosted mode. The readme does indicate that energy benchmarks cannot be accepted as a valid submission, but can performance numbers be accepted in self-hosted mode? Our concern is that even if performance numbers can be recorded, tinyml already has an automated way to test everyone's submissions via the EEMBC Runner. We just want to make sure if this is strictly enforced or not.
We're planning to compile and run the benchmarking API inside the OS, and perhaps just have an autostart functionality to it to try and replicate the same "play-and-play" embedded behavior for the submission. Is this an issue?

Preprocessing:

Is any preprocessing done to the image data prior to being fed into the DUT? If so, what type (i.e. int8, fp32, etc.) and is there any scaling, normalization, or cropping that is done with respect to the model?
Same question asked in preprocessing question 1. Is there any preprocessing or manipulations for audio prior to sending to DUT?

Thanks for all your support, and hope to hear from you all soon :)

petertorelli commented 3 years ago

Yes, the tiny repo is just a refactor.
There is no self-hosted mode, those are vestigial #defines included should we decide to include a self hosted mode in the future. You must use the runner to collect the benchmark median perf scores and multi accuracy scores.
If it works with the runner it should be ok.

honsontran commented 3 years ago

@petertorelli thanks for clarifying. We managed to retarget stdin/stdout, and now are working on getting some inference tests going.

I am able to see my device in the runner. After placing the datasets from the runner to the directory listed in red text on the runner, I pressed initialize, then benchmark. I'm currently facing some bugs on this end, and wanted to see where I would go debug further. Any suggestions will be greatly appreciated!

01582.242 parser: Command "mountc dut dut"
01582.317 dut: e-[Unknown command: nam]
01582.317 dut: m-ready
01582.317 sequencer: e-[Unknown command: nam]
01582.317 sequencer: m-sequencer-stop
01582.317 parser: e-['bm init' failed, unmounting all devices]
01582.317 parser: e-[Command 'bm' failed: Unknown command: nam]
01582.317 mounter: m-mounted-alias[dut]-uid[COM3]-driver[eeserial]
01582.317 parser: m-ready-finished[bm]
01582.317 parser: m-ready-finished[mountc]
01583.595 parser: Command "bm run ulp-mlperf"
01583.595 session: m-session-start-id[20210426194846]
01583.595 parser: m-bmark-run-name[ML Performance 1.0.0]-code[ulp-mlperf]
01583.596 parser: e-[Command 'bm' failed: Model not defined]
01583.596 parser: m-ready-finished[bm]

petertorelli commented 3 years ago

Your serial interface is malfunctioning. I'm not sure what the problem is but the DUT is not advancing/clearing the command buffer, as it is clearly stuck at 'nam', but it seems to be parsing the terminator so I don't know what is up.

If you mount the DUT with mountc dut dut and then type dut name and dut profile it should print the name of the device and the model it is compiled for. If this doesn't work the UART interface on the Pi is not implemented correctly. If this does work then we can debug the Runner further.

honsontran commented 3 years ago

Hello @petertorelli. Thanks for the quick reply.

Here is the log of the suggested commands you have given:

17229.147 parser: Command "mountc dut dut"
17229.227 dut: m-name-dut-[raspberry-pi-4]
17229.227 dut: m-ready
17229.227 mounter: m-mounted-alias[dut]-uid[COM3]-driver[eeserial]
17229.227 parser: m-ready-finished[mountc]
17236.781 parser: Command "dut name"
17236.781 parser: m-ready-finished[dut]
17236.825 dut: m-name-dut-[raspberry-pi-4]
17236.841 dut: m-ready
17241.337 parser: Command "dut profile"
17241.337 parser: m-ready-finished[dut]
17241.400 dut: m-profile-[ULPMark for tinyML Firmware V0.0.1]
17241.416 dut: m-model-[vww01]
17241.432 dut: m-ready

If it helps, we were working on writing the implementation of the serial connection, but have also found out that the following command has also made it possible for the runner to communicate with our program. I'm mentioning this here in case this would be the cause of the command buffer not clearing, or data not sending to the device. For this experiment, we're simply testing rerouting stdin/stdout between the program and the runner as such:

./bin/main < /dev/ttyS0 > /dev/ttyS0

petertorelli commented 3 years ago

Interesting. Could be a buffer flush issue with the Unix pipes + O/S, a previous run had half a command in it and was followed by a new command which caused an error, I've seen that happen before when the UART code doesn't flush properly. Might have been leftover data in the unix pipes on a restart. Manual mount works, and initialization sends those exact commands. What happens when you type bm init ulp-mlperf?

honsontran commented 3 years ago

00046.740 parser: Command "mountc dut dut"
00046.810 dut: e-[Unknown command: nme]
00046.810 mounter: m-mounted-alias[dut]-uid[COM3]-driver[eeserial]
00046.810 parser: m-ready-finished[mountc]
00046.826 dut: m-ready
00052.038 parser: Command "dut name"
00052.039 parser: m-ready-finished[dut]
00052.088 dut: m-name-dut-[raspberry-pi-4]
00052.088 dut: m-ready
00054.286 parser: Command "dut profile"
00054.286 parser: m-ready-finished[dut]
00054.360 dut: m-profile-[ULPMark for tinyML Firmware V0.0.1]
00054.376 dut: m-model-[vww01]
00054.376 dut: m-ready
00057.676 parser: Command "bm init ulp-mlperf"
00057.676 sequencer: m-sequencer-start
00057.676 sequencer: m-sequencing-i[1]-command[umount]-ack[/parser: m-ready/]-ms[5000]-acc[0]-total_ms[15000]
00057.676 parser: Command "umount"
00057.676 mounter: Unmounting "dut"
00057.791 mounter: m-unmounted-alias[dut]
00057.791 parser: m-umount-done
00057.791 parser: m-ready-finished[umount]
00057.791 sequencer: m-sequencing-i[2]-command[mountc dut dut]-ack[/parser: m-ready/]-ms[5000]-acc[5000]-total_ms[15000]
00057.791 parser: Command "mountc dut dut"
00057.863 dut: e-[Unknown command: nme]
00057.863 sequencer: e-[Unknown command: nme]
00057.863 sequencer: m-sequencer-stop
00057.863 parser: e-['bm init' failed, unmounting all devices]
00057.864 parser: e-[Command 'bm' failed: Unknown command: nme]
00057.864 mounter: m-mounted-alias[dut]-uid[COM3]-driver[eeserial]
00057.864 parser: m-ready-finished[bm]
00057.864 parser: m-ready-finished[mountc]
00057.879 dut: m-ready

this was also another issue we had where some bytes were dropping (i.e nme, nam, etc). Rerunning bm init ulp-mlperf after the log above, I was able to get this:

00105.226 parser: Command "bm init ulp-mlperf"
00105.226 sequencer: m-sequencer-start
00105.226 sequencer: m-sequencing-i[1]-command[umount]-ack[/parser: m-ready/]-ms[5000]-acc[0]-total_ms[15000]
00105.226 parser: Command "umount"
00105.226 mounter: Unmounting "dut"
00105.342 mounter: m-unmounted-alias[dut]
00105.342 parser: m-umount-done
00105.342 parser: m-ready-finished[umount]
00105.342 sequencer: m-sequencing-i[2]-command[mountc dut dut]-ack[/parser: m-ready/]-ms[5000]-acc[5000]-total_ms[15000]
00105.342 parser: Command "mountc dut dut"
00105.403 dut: m-name-dut-[raspberry-pi-4]
00105.419 dut: m-ready
00105.419 mounter: m-mounted-alias[dut]-uid[COM3]-driver[eeserial]
00105.419 parser: m-ready-finished[mountc]
00105.419 sequencer: m-sequencing-i[3]-command[dut profile]-ack[/dut: m-ready/]-ms[5000]-acc[10000]-total_ms[15000]
00105.419 parser: Command "dut profile"
00105.419 parser: m-ready-finished[dut]
00105.483 dut: m-profile-[ULPMark for tinyML Firmware V0.0.1]
00105.499 dut: m-model-[vww01]
00105.515 dut: m-ready
00105.515 sequencer: m-sequencer-stop
00105.515 parser: m-ready-finished[bm]

honsontran commented 3 years ago

Some more examples of dropped bytes in the commands being sent over posted below (i.e. timestamp). This is the log from pressing run on the performance benchmarks:

00757.271 parser: Command "bm cfg ulp-mlperf runMode"
00757.780 parser: m-bm-cfg-name[ulp-mlperf]-key[runMode]-val[single]
00757.780 parser: m-ready-finished[bm]
00757.780 sequencer: m-sequencing-i[3]-command[dut timestamp]-ack[/dut: m-ready/]-ms[5000]-acc[10000]-total_ms[457368]
00757.780 parser: Command "dut timestamp"
00757.780 parser: m-ready-finished[dut]
00757.831 dut: e-[Unknown command: tmestamp]
00757.831 sequencer: e-[Unknown command: tmestamp]
00757.831 sequencer: m-sequencer-stop
00757.831 session: m-session-stop-id[20210427131120]
00757.831 session: Saved this run to session ID 20210427131120
00757.832 parser: e-['bm run' failed, unmounting all devices]
00757.832 mounter: Unmounting "dut"
00757.950 mounter: m-unmounted-alias[dut]
00757.951 parser: e-[Command 'bm' failed: Unknown command: tmestamp]
00757.951 parser: m-ready-finished[bm]

After several times of initializing and rerunning, I was able to get past the issue and was faced with:

00887.158 parser: Command "bload dut "C:\Users\iFai1\eembc\runner\benchmarks\ulp-mlperf\datasets\vww01\000000343218.bin""
00887.158 parser: File size is 27648, loading...
00887.159 parser: Starting at byte offset 0
00887.159 parser: Sending 27648 bytes
00887.159 parser: m-mute-target[dut]
00887.430 dut: e-[Insufficent number of hex digits]
00887.430 sequencer: e-[Insufficent number of hex digits]
00887.430 sequencer: m-sequencer-stop
00887.430 session: m-session-stop-id[20210427131329]
00887.430 session: Saved this run to session ID 20210427131329
00887.431 parser: e-['bm run' failed, unmounting all devices]
00887.431 mounter: Unmounting "dut"
00887.431 parser: m-unmute-target[dut]
00887.431 parser: e-[Command 'bload' failed: Failed streaming bload: Insufficent number of hex digits]
00887.431 parser: m-ready-finished[bload]
00887.547 mounter: m-unmounted-alias[dut]
00887.547 parser: e-[Command 'bm' failed: Insufficent number of hex digits]
00887.547 parser: m-ready-finished[bm]

One of the concerns I have is how to improve the reliability of the commands being sent over, as I believe this inconsistency might also affect the transfer of data for benchmarking to the DUT.

petertorelli commented 3 years ago

Yep, that looks a lot like when the UART buffer is overflowing / not draining.

videetparekh commented 3 years ago

Hi @petertorelli thank you for your inputs. They've been very useful in guiding us towards getting the UART working end to end. In our experiments, we're noticing that the model under test ends up being set to vww01. How would we make sure that the handshake tells the runner to use ic01?

petertorelli commented 3 years ago

@videetparekh By setting the proper #define in the submitter implemented header file found here : https://github.com/mlcommons/tiny/blob/1e3bdc19aaa70009c0d9d17ed9c78bcc989e077b/v0.1/api/submitter_implemented.h#L43-L49

petertorelli commented 3 years ago

Is this resolved, can it be closed now?

honson1 commented 3 years ago

@petertorelli Yes! Thank you for all your help.

mlcommons / tiny

Testing the interface between tiny/v0.1 and eembc #60

Implementation:

Preprocessing:

Same question asked in preprocessing question 1. Is there any preprocessing or manipulations for audio prior to sending to DUT?