aku-projects commented 3 years ago

I am trying to run a custom model on k210 sipeed maix bit board. It provides 2 float32 outputs for given 200x200 grayscale image input. I generated the tflite file and converted it into kmodel too. I ran ncc inference on PC using the converted kmodel and it works perfectly. But when I try to run the same on k210 I am getting values that are either wrong or I am not sure how to interpret.

You can see the attached images for more details. I have also added zip file with models, scripts, test images with nnc bin/expected value output for debugging.

Any help/guidance will be appreciated. Thanks.

nnc version v0.2.0 beta 4

ncc compile: Screenshot from 2020-12-12 18-48-45

ncc infer: Screenshot from 2020-12-12 18-54-06

.bin output: Screenshot from 2020-12-12 18-54-18

On k210 with image input 1.jpg

fmap contents - firmware maixpy v0.5.0_125_gd4bdb25_minimum_with_kmodel_v4_support outputv0-5-0-125-gd4bdb25

fmap contents - firmware maixpy v0.5.0_220_gd5fe812_minimum_with_kmodel_v4_support outputv0-5-0-220-gd5fe812

fmap contents - firmware maixpy v0.6.0/1 (minimum_with_kmodel_v4_support) outputv0-6

attachment.zip

Neutree commented 3 years ago

It'a bug in V4 kmodel, I'll fix tomorrow

Neutree commented 3 years ago

fixed at https://github.com/sipeed/MaixPy/commit/20c776d8f13b50309af0af7f4fe47920d64e844a

thanks~

aku-projects commented 3 years ago

@Neutree Thanks for such a quick fix on this issue! Where can I download the latest firmware build with this commit? I checked on http://dl.sipeed.com/MAIX/MaixPy/release/master/ but it didn't seem to have the latest build. Or should I build this from source?

Regards

Edit: Found it.

aku-projects commented 3 years ago

Hello @Neutree I downloaded the build for the commit and the error is no longer thrown. But the outputs are still wrong or I am still not sure how to interpret them? Please find the attached images for comparison. I am providing the same input images that I used during ncc infer. Running on k210 maix bit with maixpy 20c776d_minimum_with_kmodel_v4_support maixpy error

Running on PC using ncc infer Screenshot from 2020-12-12 18-54-18

Neutree commented 3 years ago

success = kpu.set_outputs(kput_net, out_idx, width, height, channel)

maybe you should call kpu.set_outputs(task, 0, 1,1,2) ? Which means first output layer has two node

aku-projects commented 3 years ago

I am sorry about the confusion - The model has 2 output layers each with single output. (you can check the attached tflite or kmodel in the original post attachment.zip) I am not sure whether it might be changed by ncc compile. This is the output for ncc compile:

Screenshot from 2020-12-12 18-48-45

When I run the ncc infer the binary file also contains 2 x 4byte data Screenshot from 2020-12-14 12-12-08

This is the reason I chose to providekpu.set_outputs(task, 0, 1,1,1) and kpu.set_outputs(task, 1, 1,1,1)

After your suggestion I did trykpu.set_outputs(task, 0, 1,1,2). But this is the error I get : maixpy channel error

I tried kpu.set_outputs(task, 1, 1,1,2) just in case but get the same error

michaelteo commented 3 years ago

@aku-projects -

You can refer to this : https://en.bbs.sipeed.com/t/topic/1790

I think you need to do

a = kpu.set_outputs(task, 0, 1,1,1)
a = kpu.set_outputs(task, 1, 1,1,1)

fmap = kpu.forward(task, img)

fmap=kpu.get_output(task, 0) # gets the output at index 0
plist=fmap[:] # assigns fmap to a list
output_1 =plist[0] #gets 1st element of 1st output

fmap=kpu.get_output(task, 1) # gets the output at index 1 , so the second node
plist=fmap[:] # assigns fmap to a list
output_2 =plist[0] #gets 1st element of 2nd output

I think the API for KPU forward - kpu.forward(task,img,1) - means it is doing the forward up to Layer 1 of the neural network.

aku-projects commented 3 years ago

@michaelteo Thanks for having a look at this issue!

What you mentioned makes sense and tried it out right now. Sadly, I did not get the correct results. It seems to be wrong in a similar way as the earlier approach - output_1 always "1.0" and output_2 some very large value. Oddly what I noticed was for the same image the result for output_2 is different for different runs. (compare output_2 for 1.jpg at the beginning and at the end)

kpu.set_outputs(task, 0, 1, 1, 1) True kpu.set_outputs(task, 1, 1, 1, 1) True img = image.Image('/sd/1.jpg') img.pix_to_ai() fmap = kpu.forward(task, img) fmap=kpu.get_output(task, 0) plist=fmap[:] plist[0] 1.0 fmap=kpu.get_output(task, 1) plist=fmap[:] plist[0] -5.462543e+37

img = image.Image('/sd/2.jpg') img.pix_to_ai() fmap = kpu.forward(task, img) fmap=kpu.get_output(task, 0) plist=fmap[:] plist[0] 1.0 fmap=kpu.get_output(task, 1) plist=fmap[:] plist[0] -1.238511e+37 img = image.Image('/sd/3.jpg') img.pix_to_ai() fmap = kpu.forward(task, img) fmap=kpu.get_output(task, 0) plist=fmap[:] plist[0] 1.0 fmap=kpu.get_output(task, 1) plist=fmap[:] plist[0] 6.498523e+36 img = image.Image('/sd/4.jpg') img.pix_to_ai() fmap = kpu.forward(task, img) fmap=kpu.get_output(task, 0) plist=fmap[:] plist[0] 1.0 fmap=kpu.get_output(task, 1) plist=fmap[:] plist[0] 2.100967e+37 img = image.Image('/sd/5.jpg') img.pix_to_ai() fmap = kpu.forward(task, img) fmap=kpu.get_output(task, 0) plist=fmap[:] plist[0] 1.0 fmap=kpu.get_output(task, 1) plist=fmap[:] plist[0] 3.137721e+37 img = image.Image('/sd/1.jpg') img.pix_to_ai() fmap = kpu.forward(task, img) fmap=kpu.get_output(task, 0) plist=fmap[:] plist[0] 1.0 fmap=kpu.get_output(task, 1) plist=fmap[:] plist[0] 2.484653e+37

Not sure how to explain this behavior

michaelteo commented 3 years ago

I see. Which version of firmware do you use? Maybe reverting to one of the 0.5.0 or 0.5.1 might help?

One other thing is if the images are already resized to 200x200? I assumed the images are already. But if not might want to do something like this:

img_resized = img.resize (200,200)
a= img_resized.pixtoai()

Otherwise i don't know. I did something similar over the weekend. It did run for me even on various firmware, although I used mostly the prebuilt firmware that were "minimum with kmodel4 support"

Everything you are doing seems similar already to how I did it except I loaded from the camera and the resize part. So I couldn't really verify if KPU was really working exactly like what nncase infer was producing.

Longshot ideas:

When you compiled in nncase, I saw you didnt have both --input-std 0.0039216 --input-mean 0. Maybe it'll help
Is CPU or KPU overclocked? maybe bringing it back to normal might make it more stable.

aku-projects commented 3 years ago

I have tried it on 0.5.0_220, 0.5.0_125, 0.6.0, 0.6.1 all of it "minimum with kmodel4 support". It wasn't producing the proper result in any of it.
Yes. The images I am using are already 200x200 grayscale images.
I did try loading from camera but the result were similar (1.0 , very large value) and I couldn't verify it in any way, so I stuck to feeding the same images that I had used to verify kmodel using ncc infer on PC

long shot ideas:

I assumed that --inpu-mean defaults to 0 hence skipped it. But I guess this is not causing the issue anyway as results during ncc infer are perfect.
I am using sipeed maix bit without any modification. So I guess it is running at normal speed?

The kmodel gives expected result when using ncc infer but the same kmodel on k210 seems to go way off even though the images I feed to it are the same! I am not sure what preprocessing happens during ncc infer or what is causing the issue on the chip. I have attached tflite file, kmodel file, images for testing and expected result all in attachment.zip in the original post. Feel free to have a go at it if time permits.

@michaelteo Thanks once again. I really appreciate your inputs.

michaelteo commented 3 years ago

@aku-projects - Some added info - on the chinese sipeed forums - https://cn.bbs.sipeed.com/d/365-kpuncc-infer, there seems to be a reported discrepancy between kpu and ncc. Unfortunately no resolution and no activity since October 16.

My model unfortunately was just "running" but was I was unable to verify if it really worked.

I was trying to just copy over a tflite model and run as kmodel. The original model was very large, so I had to quantize like crazy to make it fit the KPU. so it was wildly inaccurate even in ncc infer. I just assumed the KPU worked.

I might not be able to give your model+code a try anytime soon. But i'll post back here if I get to it.

as for the overclocking - there is actually a command where you can overclock the CPU and KPU https://maixpy.sipeed.com/en/libs/Maix/freq.html#freqset-cpu-pll1-kpudiv

aku-projects commented 3 years ago

Seems like same issue was being faced by the OP there. And yes, unfortunately no resolution was reached so I guess I have to get it worked out here. Thankfully the model I started out with was pretty small so was pretty confident I could get it to work on k210 without any issues. I have been too optimistic I guess. If I get this model to work I will definitely look at overclocking to improve throughput of the application. 👍

Sure, if you ever get it to work let me know. Or hopefully the mods will sort it out soon !

iot17fa commented 3 years ago

@aku-projects - I tested the "model_wheigts_59.h5" ncc compile my_model2.tflite ../models_kmodel/my_model2_v2b4.kmodel -i tflite --dataset testes

got: SUMMARY INPUTS 0 input_1 1x1x200x200 OUTPUTS 0 Identity 1x1 1 Identity_1 1x1

with: kpu.set_outputs(task, 0, 1, 1, 1) kpu.set_outputs(task, 1, 1, 1, 1)

free gc heap memory : 500 KB

free sys heap memory: 3344 KB

loaded model:

free gc heap memory : 500 KB

free sys heap memory: 1332 KB

achieved with your images:

free gc heap memory : 500 KB

free sys heap memory: 3344 KB

free gc heap memory : 500 KB

free sys heap memory: 1332 KB

/sd/dronet_img/1.jpg ('ID1', 0.8000001, 'ID2', -0.3619588) /sd/dronet_img/2.jpg ('ID1', 0.0, 'ID2', 0.1511786) /sd/dronet_img/3.jpg ('ID1', 0.0, 'ID2', 0.004798041) /sd/dronet_img/4.jpg ('ID1', 0.03137255, 'ID2', 0.1588571) /sd/dronet_img/5.jpg ('ID1', 0.0, 'ID2', -0.1286324)

not bad ?

problem is that it is too slow (as reported elsewhere). ~1fps

hope this helps

aku-projects commented 3 years ago

@iot17fa The values look brilliant. It is very close/same as the expected values! Thanks for looking into this.

I haven't tried it out yet. I will do it over this weekend and then close this issue+update the k210-dronet repo so that everyone can make use of this.

Cheers!

aku-projects commented 3 years ago

Hello @iot17fa, I finally got around to testing this but I wasn't able to replicate your solution

2 steps I noticed from the solution were: 1) Usage of a different model weight file. 2) Usage of ncc compile with no options like inference-type,input-std, input-type etc

I wasn't sure how you got the h5 file. Did you retrain the network? Since I wasn't sure about this info I skipped it. I used the command you've used to compile the model attached in the issue. Running ncc infer on it gave wrong results. But I anyway went ahead and ran it on k210 but got the same wrong results again.

Could you let me know about:

How to get the modelweight/tflite model you used?
The ncc command used to compile the tflite model?
The maixpyscript commands in case it is different from the one seen in the above comments.

If you could attach them all it would be great. Thanks once again !

iot17fa commented 3 years ago

Hi aku-projects

1 - I used the model from https://github.com/uzh-rpg/rpg_public_dronet

2 - Run nncase with defaults for some optional parameters, as inference-type defaulting to uint8.

Used the command : ncc compile my_model2.tflite ../models_kmodel/my_model2_v2b4.kmodel -i tflite --dataset testes

ncc compile -i [--dataset ]

Try float with and without dataset calibration. I found that the float gives a bigger kmodel that doesn’t work... (out of memory or hanging).

Check https://github.com/sipeed/MaixPy/issues/256

3 - script used:

print (kmodel) print(kpu.memtest()) task = kpu.load (kmodel) print(kpu.memtest())

a=kpu.set_outputs(task, 0, 1, 1, 1) #it is add for V4

print ("set output 0")

a=kpu.set_outputs(task, 1, 1, 1, 1) #it is add for V4

print ("set output 1")

clock = time.clock() while(True): img = sensor.snapshot() clock.tick() print ("snapshot") a = kpu.forward(task, img) print ("a = kpu.forward")

fmap = kpu.get_output(task, 0)
#print("fmap",fmap)
Identity = fmap[0]
print ("Identity   - ", Identity)
lcd.draw_string(0, 224, "%s:%.2f"%("P. coll", Identity))
# info=kpu.netinfo(task) " ??? Error 

fmap = kpu.get_output(task, 1)
#print("fmap",fmap)
Identity_1 = fmap[0]
print ("Identity_1 - ", Identity_1)

Hope this helps.

sipeed / MaixPy-v1

How to interpret output of kpu.forward() for custom model? #360

free gc heap memory : 500 KB

free sys heap memory: 3344 KB

free gc heap memory : 500 KB

free sys heap memory: 1332 KB

free gc heap memory : 500 KB

free sys heap memory: 3344 KB

free gc heap memory : 500 KB

free sys heap memory: 1332 KB

print ("set output 0")

print ("set output 1")