tensil-ai / tensil

Open source machine learning accelerators
https://www.tensil.ai
Other
348 stars 28 forks source link

Stuck at self.wait_for_flush #87

Open swleungbrian opened 1 year ago

swleungbrian commented 1 year ago

Dear Sir,

I am trying the notebook for yolov4-tiny on ultra96 and stuck at tcu = Driver(ultra96, overlay.axi_dma_0) I have to stop the notebook and found the following traces:

/home/xilinx/tcu_pynq/driver.py in wait_for_flush(self) 267 268 def wait_for_flush(self): --> 269 while not self.dram0.compare( 270 self.scalar_address(self.probe_target_address), 271 self.probe_source):

/home/xilinx/tcu_pynq/mem.py in compare(self, offset, data) 64 ) 65 data = data.reshape((-1,)) ---> 66 return np.array_equal(self.mem[offset : offset + len(data)], data)

Regards,

Brian

petrohi commented 1 year ago

This can happen if Tensil Compute Unit (TCU) is not functioning correctly.

Did you follow the steps to create block design and the bitstream in Vivado?

swleungbrian commented 1 year ago

yes, I followed steps on https://www.tensil.ai/docs/tutorials/yolo-ultra96v2/ and https://www.tensil.ai/docs/tutorials/resnet20-ultra96v2/

attached here is the exported hardware xsa file (renamed as zip)

tensil_ultra96v2_wrapper.zip

petrohi commented 1 year ago

Can you also share exported block design? (click Export -> Export Block Design while having block design open)

swleungbrian commented 1 year ago

Here's my exported bd (zipped)

tensil_ultra96v2_exported_bd.zip

swleungbrian commented 1 year ago

Here's my exported bd (zipped)

tensil_ultra96v2_exported_bd.zip

hi all, any advices? thanks

petrohi commented 1 year ago

I successfully implemented the bitstream in Vivado based on your design script, I see no errors in design and timing is met. I will test on Ultra96 board next.

petrohi commented 1 year ago

I am attaching the bitstream and hwh files made by implementing your block design in Vivado.

tensil_ultra96v2_test.zip

To load it in the PYNQ code:

overlay = Overlay('/home/xilinx/tensil_ultra96v2_test.bit')
tcu = Driver(ultra96, overlay.axi_dma_0)

This code run successfully on my Ultra96v2 as well as the following ResNet20 inference from the tutorial.