nvdla / hw

RTL, Cmodel, and testbench for NVDLA
Other
1.72k stars 566 forks source link

How do I run a compiled model in RTL environment #89

Open tariqafzal opened 6 years ago

tariqafzal commented 6 years ago

I have compiled a small caffe model using nvdla_compiler. The compiler generates output.protobuf and default.nvdla files. nvdla_compiler does not report any error. How can I run this model in RTL sim environment (verilator based). The testbench expects files which are very different. Any help would be appreciated.

jwise commented 6 years ago

I don't think we currently have a flow to run nvdla_compiler output with the RTL simulator, nor is one on the roadmap. I think one could be produced given the runtime code in the sw tree, but I don't think that we have the staffing to do this on the NVIDIA side right now. If you do this, though, please let us know, and we'll be happy to support you in any way we can!

tariqafzal commented 6 years ago

In <..>/hw/verif/traces/traceplayer directory there are couple of tests like conv_8x8_fc_int16 which contains files (inout.txn, input_weight.dat, input_feature_map.dat) sufficient to run in RTL simulation environment. How were they generated?

jwise commented 6 years ago

Good question. I'm pretty sure they were hand-generated by someone working with a bunch of weights in a rather unoptimized fashion; I'm not sure there ever existed a good mechanized way to do that. (The NVDLA compiler tries pretty hard to generate instruction lists that are as performant as possible, by contrast.)

tariqafzal commented 6 years ago

Currently, nvdla_runtime is tied to Linux and ARM processor and if I am not planning to use either of these how do I generate equivalent functionality. In general, how do you see people doing a performance evaluation of nvdla. Excel Spreadsheet is restricted to very classical and old CNNs and does not say anything about LSTMs

jwise commented 6 years ago

Currently we don't have hardware support for LSTM elements; you would need to sum up the layers (using the performance estimator) that can be run on NVDLA, and then also add in the LSTM evaluation time.

The NVDLA runtime is designed to be portable to non-Linux and non-ARM systems (though we obviously only have the Virtual Platform built into QBOX's ARM simulation). If you are working on another port target for the runtime, please let us know, and we'll be happy to try to help.

debjitpal commented 6 years ago

@jwise I am wondering who wrote the input.txn format for the GoogleNet since its almost 340 lines long input. Its impossible for a human to write that code even in an unoptimized manner. Would you mind looking into this if there is a way or any script that was used internally to generate the txn files from the nvdla_compiler output? Thanks in advance for your help and suggestions.

jwise commented 6 years ago

I believe that it was in fact generated, but I suspect that the generation was from a sufficiently old version of the NVDLA compiler that it would be less work to start from scratch than bring it up to date. From our internal discussions about this, I am given to believe that there was also a fair bit of massaging by hand from the generator (i.e., layer parameters may have been hand-specified, and memory addresses may also have been) -- so I don't think there ever was a turnkey solution.

This could be a good first project for someone trying to familiarize themselves with the NVDLA runtime.

debjitpal commented 6 years ago

Hi Joshua, thanks for your response. I understand that NVDLA runtime is the key point now to get the trace file format. Can you somehow point me the code base (the files/ code locations ) that need to be looked at? Since NVDLA code base is huge, getting hold of this will take longer. You help can expedite the process and I would much appreciate your kind help.

Also, is there any way to share the old NVDLA compile source code? I am just trying to get a feel for this work as I am working towards some paper deadline and I need to get those trace files as soon as I can.

jwise commented 6 years ago

Agreed that the codebase is pretty large! If I were trying to do this as quickly as possible, I would probably instrument the kernel-mode driver to print register reads and writes: https://github.com/nvdla/sw/blob/master/kmd/port/linux/nvdla_core_callbacks.c

And then fix up the memory loads by hand (either with correct data, dumped at some stage -- maybe in the UMD, before data is submitted to the hardware? -- or with dummy data).

I did go looking, and I don't believe that a 'loadable player' is on the roadmap to release. I'll ask around a little bit to see what state it's in, and when it was last tested -- if it's in a reasonable condition to release, or not (we'd rather not release code that we don't have the capacity to support). But your best bet right now would be to try to instrument the KMD yourself.

debjitpal commented 6 years ago

Thanks, Joshua. I shall start working on this. Hopefully, I shall be able to do something quickly. I shall start with some small networks and will see how it goes. If I come up with something working, I shall surely keep you posted so that if possible, you folks can upload it to GITHUB for other users.

Andrawzyf commented 6 years ago

hi,can you tell me how to compiled a model, use the code of cmod?

brad0taylor commented 6 years ago

jwise - can you expand on this a bit more? Currently we don't have hardware support for LSTM elements; you would need to sum up the layers (using the performance estimator) that can be run on NVDLA, and then also add in the LSTM evaluation time.

I notice there is no mention of LSTM or RNN layers in the docs. Does this mean that these types are not possible to implement on the DLA and would need to be supported outside the DLA?

wxbbuaa2011 commented 6 years ago

@debjitpal Hello, have you solved your question? I hope to followyour work.

silvaurus commented 6 years ago

Hi! I also need more RTL simulator tests taken from real neuron network layers than the ones provided. I know how to get RegRead and RegWrite from KMD, but I'm not sure if I understand this: 'fix up the memory loads by hand (either with correct data, dumped at some stage -- maybe in the UMD, before data is submitted to the hardware? -- or with dummy data).' Could you give some details on this?

Thank you so much!

wxbbuaa2011 commented 5 years ago

Thanks, Joshua. I shall start working on this. Hopefully, I shall be able to do something quickly. I shall start with some small networks and will see how it goes. If I come up with something working, I shall surely keep you posted so that if possible, you folks can upload it to GITHUB for other users. @debjitpal Thank you very much. hope you provide the address of github.

wxbbuaa2011 commented 5 years ago

@debjitpal Hello, have you solved your question? I also hope to followyour work.

suchandler96 commented 1 year ago

Hi all, although it seems a long time after the discussion in this thread, I'd like to share my efforts to generate input.txn of an arbitrary NN, in case anyone in the future may find it useful (since I spent months debugging the workflow)...

The codes are available in a directory of my repo, where I wrote a readme to describe the procedure of generating 'input.txn' (which mainly follows the VP method). Discussions are welcome under my repo. Thanks!