nvdla / hw

RTL, Cmodel, and testbench for NVDLA
Other
1.71k stars 565 forks source link

NVDLA integrated with RISC-V SoC on Amazon FPGA Cloud #267

Open farzadfch opened 5 years ago

farzadfch commented 5 years ago

I hope this not an inappropriate use of the issues section, but I want to let you know that we've integrated NVDLA with an open-source RISC-V SoC (Rocket Chip) and the entire setup is running on the Amazon FPGA cloud. This is accessible at https://github.com/CSL-KU/firesim-nvdla.

We've successfully run YOLOv3 object detection algorithm on nv_large configuration at 7.5 fps.

This should be particularly useful for those who want to experiment with the actual NVDLA hardware, but they don't have access to an expensive FPGA board. In fact, this is FPGA-accelerated simulation of the chip, so you can get accurate performance results as if NVDLA is fabricated on a chip and clocked at a much higher frequency comparing to a plain FPGA implementation.

shgoupf commented 5 years ago

I hope this not an inappropriate use of the issues section, but I want to let you know that we've integrated NVDLA with a RISC-V SoC (Rocket Chip) and the entire setup is running on the Amazon FPGA cloud. This is accessible at https://github.com/CSL-KU/firesim-nvdla.

We've successfully run YOLOv3 object detection algorithm on nv_large configuration at 7.5 fps.

This should be particularly useful for those who want to experiment with the actual NVDLA hardware, but they don't have access to an expensive FPGA board. In fact, this is an FPGA-accelerated simulation of a chip, so you can get accurate performance results as if NVDLA is fabricated on a chip and clocked at a much higher frequency comparing to a plain FPGA implementation.

Great work! Just one quick question: how did you generate the nv_large loadable for YOLOv3? Since the compiler is not available from nvidia for nv_large, did you have your own nv_large compiler to compile the YOLOv3 model to nvdla loadable?

farzadfch commented 5 years ago

I hope this not an inappropriate use of the issues section, but I want to let you know that we've integrated NVDLA with a RISC-V SoC (Rocket Chip) and the entire setup is running on the Amazon FPGA cloud. This is accessible at https://github.com/CSL-KU/firesim-nvdla. We've successfully run YOLOv3 object detection algorithm on nv_large configuration at 7.5 fps. This should be particularly useful for those who want to experiment with the actual NVDLA hardware, but they don't have access to an expensive FPGA board. In fact, this is an FPGA-accelerated simulation of a chip, so you can get accurate performance results as if NVDLA is fabricated on a chip and clocked at a much higher frequency comparing to a plain FPGA implementation.

Great work! Just one quick question: how did you generate the nv_large loadable for YOLOv3? Since the compiler is not available from nvidia for nv_large, did you have your own nv_large compiler to compile the YOLOv3 model to nvdla loadable?

Thank you! You are right the compiler is not available for nv_large. This is done by Nvidia specifically for YOLOv3. I'm also waiting on the compiler like others!

shgoupf commented 5 years ago

I hope this not an inappropriate use of the issues section, but I want to let you know that we've integrated NVDLA with a RISC-V SoC (Rocket Chip) and the entire setup is running on the Amazon FPGA cloud. This is accessible at https://github.com/CSL-KU/firesim-nvdla. We've successfully run YOLOv3 object detection algorithm on nv_large configuration at 7.5 fps. This should be particularly useful for those who want to experiment with the actual NVDLA hardware, but they don't have access to an expensive FPGA board. In fact, this is an FPGA-accelerated simulation of a chip, so you can get accurate performance results as if NVDLA is fabricated on a chip and clocked at a much higher frequency comparing to a plain FPGA implementation.

Great work! Just one quick question: how did you generate the nv_large loadable for YOLOv3? Since the compiler is not available from nvidia for nv_large, did you have your own nv_large compiler to compile the YOLOv3 model to nvdla loadable?

Thank you! You are right the compiler is not available for nv_large. This is done by Nvidia specifically for YOLOv3. I'm also waiting on the compiler like others!

So do you have a pre-generated loadable for YOLOv3 model that can be running with nv_large? I guess you should have such a loadable otherwise the test cannot be done in your platform. Is that loadable released on github?

farzadfch commented 5 years ago

I hope this not an inappropriate use of the issues section, but I want to let you know that we've integrated NVDLA with a RISC-V SoC (Rocket Chip) and the entire setup is running on the Amazon FPGA cloud. This is accessible at https://github.com/CSL-KU/firesim-nvdla. We've successfully run YOLOv3 object detection algorithm on nv_large configuration at 7.5 fps. This should be particularly useful for those who want to experiment with the actual NVDLA hardware, but they don't have access to an expensive FPGA board. In fact, this is an FPGA-accelerated simulation of a chip, so you can get accurate performance results as if NVDLA is fabricated on a chip and clocked at a much higher frequency comparing to a plain FPGA implementation.

Great work! Just one quick question: how did you generate the nv_large loadable for YOLOv3? Since the compiler is not available from nvidia for nv_large, did you have your own nv_large compiler to compile the YOLOv3 model to nvdla loadable?

Thank you! You are right the compiler is not available for nv_large. This is done by Nvidia specifically for YOLOv3. I'm also waiting on the compiler like others!

So do you have a pre-generated loadable for YOLOv3 model that can be running with nv_large? I guess you should have such a loadable otherwise the test cannot be done in your platform. Is that loadable released on github?

You can find the loadables here: https://github.com/prasshantg/odla_data This runs as a part of darknet: https://github.com/prasshantg/darknet I had to do some fixes to make it work. The working code is released as a part of the repo I mentioned in the first post.

shgoupf commented 5 years ago

I hope this not an inappropriate use of the issues section, but I want to let you know that we've integrated NVDLA with a RISC-V SoC (Rocket Chip) and the entire setup is running on the Amazon FPGA cloud. This is accessible at https://github.com/CSL-KU/firesim-nvdla. We've successfully run YOLOv3 object detection algorithm on nv_large configuration at 7.5 fps. This should be particularly useful for those who want to experiment with the actual NVDLA hardware, but they don't have access to an expensive FPGA board. In fact, this is an FPGA-accelerated simulation of a chip, so you can get accurate performance results as if NVDLA is fabricated on a chip and clocked at a much higher frequency comparing to a plain FPGA implementation.

Great work! Just one quick question: how did you generate the nv_large loadable for YOLOv3? Since the compiler is not available from nvidia for nv_large, did you have your own nv_large compiler to compile the YOLOv3 model to nvdla loadable?

Thank you! You are right the compiler is not available for nv_large. This is done by Nvidia specifically for YOLOv3. I'm also waiting on the compiler like others!

So do you have a pre-generated loadable for YOLOv3 model that can be running with nv_large? I guess you should have such a loadable otherwise the test cannot be done in your platform. Is that loadable released on github?

You can find the loadables here: https://github.com/prasshantg/odla_data This runs as a part of darknet: https://github.com/prasshantg/darknet I had to do some fixes to make it work. The working code is released as a part of the repo I mentioned in the first post.

Great! Thanks for your information.

We also have nv_large working with IBM CAPI (server based FPGA acceleration), I'll try this loadable in that platform as well.

The link to nvdla on CAPI: https://github.com/shgoupf/snap/tree/master/actions/hdl_nvdla

prasshantg commented 5 years ago

Great work Farzad!!

farzadfch commented 5 years ago

Great work Farzad!!

Thank you, Prashant! This would have not been possible without your work :)

huangwei858 commented 5 years ago

Great work Farzad!!

Thank you, Prashant! This would have not been possible without your work :)

Hey, I've deploy nv_small(64MACs)part to zcu102 board successfully. But when I run resenet50, it shows to us that nvdla just can run 7.3fps. It's too over utilization when move to 128MACs. ZCU102 can not deploy 128MACs resource to really board. These performance is a bottleneck for our team which research DPU of nvdla. Because we need to more powerful and less resource DPU. Do you find similar errors in nvdla module(do not care small / full / large) when you deploy to actual board?