nvdla / vp

Virtual Platform for NVDLA
Other
139 stars 82 forks source link

Compiler & runtime crash in docker VP #45

Open BigFatFlo opened 5 years ago

BigFatFlo commented 5 years ago

Hi,

I'm using the docker image provided on DockerHub to run the virtual platform. When I try to use nvdla_compiler to generate a loadable from my LeNet model, it crashes with this message:

./nvdla_compiler --prototxt lenet.prototxt --caffemodel lenet_iter_10000.caffe
model --configtarget nv_small --cprecision int8 --profile basic --calibtable cal
ib_table.json
./nvdla_compiler: line 2: syntax error: unexpected redirection
# ./nvdla_compiler: line 1: ELF: not found

If I compile my model on my machine using the nvdla_compiler from nvdla/sw/prebuilt/linux, it compiles into a loadable with no problem. Unfortunately when I try to use that model in the nvdla_runtime in the docker vp, the runtime starts running but then crashes and exits the QEMU instance:

# ./nvdla_runtime --loadable nvdla/lenet_model_nosoftmax_nocalib_basic.nvdla --i
mage nvdla/digits/four_inv.pgm
creating new runtime context...
[  460.522980] random: crng init done
Emulator starting
pgm2dimg 1 28 28 1 224 6272 401408
submitting tasks...
[  465.391729] Enter:dla_read_network_config
[  465.392504] Exit:dla_read_network_config status=0
[  465.392763] Enter: dla_initiate_processors
[  465.393181] Enter: dla_submit_operation
[  465.393407] Prepare Convolution operation index 0 ROI 0 dep_count 1
[  465.393722] Enter: dla_prepare_operation
[  465.394232] processor:Convolution group:0, rdma_group:0 available
[  465.394657] Enter: dla_read_config
[  465.396656] Exit: dla_read_config
[  465.396903] Exit: dla_prepare_operation status=0
[  465.397186] Enter: dla_program_operation
[  465.397424] Program Convolution operation index 0 ROI 0 Group[0]
root@545d7e33f93c:/usr/local/nvdla# 

Any suggestions on getting the compiler and the runtime to work in docker?

Thank you

fisherxue commented 5 years ago

Run compiler directly in the Docker image. No need to run aarch64_toplevel -c aarch64_nvdla.lua first.

shazib-summar commented 5 years ago

Hey, @BigFatFlo. Did you find a solution? I am facing the same issue as you described in the second terminal log. Do you have any updates? Thanks a lot.

shazib-summar commented 5 years ago

Run compiler directly in the Docker image. No need to run aarch64_toplevel -c aarch64_nvdla.lua first.

@fisherxue at the time of writing, nvdla_runtime is available only for aarch64. So, you have to run the aarch64_toplevel -c aarch64_nvdla.lua command to emulate a Cortex A57 processor. Otherwise you wont be able to run nvdla_runtime

BigFatFlo commented 5 years ago

@killerzula if I remember correctly, I "fixed" it by writing a new Dockerfile myself to get a docker image on which to run the virtual platform. One trick was to make sure you have compatible versions of the compiler, runtime and platform, to handle nvdla_full small or large, otherwise the runtime will crash. However I think @fisherxue was right, you can just run the compiler on your host system, not inside the docker container.

shaumik1 commented 4 years ago

currently I face the same runtime issue.. I am using the latest Dockerfile (https://hub.docker.com/r/nvdla/vp/) and the latest prebuilt binaries (https://github.com/nvdla/sw/tree/master/prebuilt/arm64-linux).. so I hope platform and runtime versions should be compatible.. and the fbuf is also a pre-compiled file

   //Start with mounting the directory to /mnt, followed by insmod drm.ko and insmod opendla_2.ko
   # ./nvdla_runtime --loadable ../../regression/flatbufs/kmd/NN/NN_L0_0_fbuf --image  ../../regression/images/digits/seven.pgm --rawdump
   creating new runtime context...
   Emulator starting
   pgm2dimg 1 28 28 1 896 25088 25088
   submitting tasks...
   Work Found!
   Work Done
   [  371.318462] Enter:dla_read_network_config
   [  371.319911] Exit:dla_read_network_config status=0
   [  371.320349] Enter: dla_initiate_processors
   [  371.321021] Enter: dla_submit_operation
   [  371.321376] Prepare Convolution operation index 0 ROI 0 dep_count 1
   [  371.321932] Enter: dla_prepare_operation
   [  371.324673] processor:Convolution group:0, rdma_group:0 available
   [  371.325445] Enter: dla_read_config
   [  371.326404] Exit: dla_read_config
   [  371.329340] Exit: dla_prepare_operation status=0
   [  371.329909] Enter: dla_program_operation
   [  371.330336] Program Convolution operation index 0 ROI 0 Group[0]
   root@ab18aa4f023d:/usr/local/nvdla#

@BigFatFlo what modification in the dockerfile helped you 'fix' the issue, could you provide some more details? @HaiqingSun @jarodw0723 any suggestions or hints to fix this?

BigFatFlo commented 4 years ago

@shaumik1 I haven't touched NVDLA in a while, but here's the Dockerfile I used for the virtual platform.

FROM nvdla_tools:1.0.0

WORKDIR /nvdla

ARG nvdla_version=nv_full

COPY nvdla_hw /nvdla/hw/
COPY vp /nvdla/vp/

WORKDIR /nvdla/hw

RUN git checkout $nvdla_version && \
    echo "PROJECTS := $nvdla_version" > tree.make && \
    echo "COVERAGE := 0" >> tree.make && \
    echo "USE_DESIGNWARE := 0" >> tree.make && \
    echo "CPP := /usr/bin/cpp-4.9" >> tree.make && \
    echo "GCC := /usr/bin/g++-4.9" >> tree.make && \
    echo "PERL := /usr/bin/perl" >> tree.make && \
    echo "JAVA := /usr/bin/java" >> tree.make && \
    echo "SYSTEMC := /usr/local/systemc-2.3.0/" >> tree.make && \
    echo "PYTHON := /usr/bin/python3" >> tree.make && \
    echo "VERILATOR := verilator" >> tree.make && \
    echo "CLANG := clang" >> tree.make && \
    tools/bin/tmake -build cmod_top

WORKDIR /nvdla/vp
RUN cmake -DCMAKE_INSTALL_PREFIX=build \
          -DSYSTEMC_PREFIX=/usr/local/systemc-2.3.0/ \
          -DNVDLA_HW_PREFIX=/nvdla/hw \
          -DNVDLA_HW_PROJECT=$nvdla_version && \
          make && \
          make install

WORKDIR /nvdla/vp
RUN mkdir -p images/linux-4.13.3

COPY sw /nvdla/sw/

WORKDIR /nvdla/sw
RUN git checkout $nvdla_version && \
    cp ./prebuilt/linux/Image /nvdla/vp/images/linux-4.13.3/. && \
    cp ./prebuilt/linux/rootfs.ext4 /nvdla/vp/images/linux-4.13.3/. && \
    cp -r prebuilt /nvdla/vp/.

ENV SC_SIGNAL_WRITE_CHECK DISABLE

WORKDIR /nvdla/vp

The nvdla_tools image is just a docker container with all the required prerequisites, built using this Dockerfile:

FROM ubuntu:14.04

RUN sudo apt-get update && \
    sudo apt-get install -y software-properties-common && \
    sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test && \
    sudo apt-get update && \
    sudo apt-get install -y cmake libboost-dev python-dev libglib2.0-dev \
                            libpixman-1-dev liblua5.2-dev swig libcap-dev \
                            libattr1-dev && \
    sudo apt-get install -y gcc-4.9 && \
    sudo apt-get install -y g++-4.9 && \
    sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 60 && \
    sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 60 && \
    sudo apt install -y git && \
    sudo add-apt-repository -y ppa:openjdk-r/ppa && \
    sudo apt update && \
    sudo apt install -y openjdk-8-jdk && \
    sudo apt install -y wget

WORKDIR /tmp
RUN wget -O systemc-2.3.0a.tar.gz \
    http://www.accellera.org/images/downloads/standards/systemc/systemc-2.3.0a.tar.gz && \
    tar xf systemc-2.3.0a.tar.gz && \
    rm -f systemc-2.3.0a.tar.gz

WORKDIR /tmp/systemc-2.3.0a
RUN sudo mkdir -p /usr/local/systemc-2.3.0 && \
    mkdir objdir && \
    cd objdir/ && \
    ../configure --prefix=/usr/local/systemc-2.3.0 && \
    make && \
    sudo make install

WORKDIR /tmp
RUN wget -O IO-Tee-0.65.tar.gz \
    http://search.cpan.org/CPAN/authors/id/N/NE/NEILB/IO-Tee-0.65.tar.gz && \
    tar xf IO-Tee-0.65.tar.gz && \
    rm -f IO-Tee-0.65.tar.gz

WORKDIR /tmp/IO-Tee-0.65
RUN perl Makefile.PL && \
    make && \
    sudo make install

WORKDIR /tmp
RUN wget -O YAML-1.24.tar.gz \
    http://search.cpan.org/CPAN/authors/id/T/TI/TINITA/YAML-1.24.tar.gz && \
    tar xf YAML-1.24.tar.gz && \
    rm -f YAML-1.24.tar.gz

WORKDIR /tmp/YAML-1.24
RUN perl Makefile.PL && \
    make && \
    sudo make install

WORKDIR /tmp
RUN wget -O Capture-Tiny-0.48.tar.gz \
    http://search.cpan.org/CPAN/authors/id/D/DA/DAGOLDEN/Capture-Tiny-0.48.tar.gz && \
    tar xf Capture-Tiny-0.48.tar.gz && \
    rm -f Capture-Tiny-0.48.tar.gz

WORKDIR /tmp/Capture-Tiny-0.48
RUN perl Makefile.PL && \
    make && \
    sudo make install

WORKDIR /tmp
RUN wget -O XML-Simple-2.25.tar.gz \
    http://search.cpan.org/CPAN/authors/id/G/GR/GRANTM/XML-Simple-2.25.tar.gz && \
    tar xf XML-Simple-2.25.tar.gz && \
    rm -f XML-Simple-2.25.tar.gz

WORKDIR /tmp/XML-Simple-2.25
RUN perl Makefile.PL && \
    make && \
    sudo make install

WORKDIR /tmp
RUN wget -O XML-Parser-2.44.tar.gz \
    http://search.cpan.org/CPAN/authors/id/T/TO/TODDR/XML-Parser-2.44.tar.gz && \
    tar xf XML-Parser-2.44.tar.gz && \
    rm -f XML-Parser-2.44.tar.gz

WORKDIR /tmp/XML-Parser-2.44
RUN perl Makefile.PL && \
    make && \
    sudo make install

WORKDIR /nvdla

RUN rm -rf /tmp/*

COPY ./Dockerfile /.

Hope it helps.

shaumik1 commented 4 years ago

@BigFatFlo thanks a lot for the details! much appreciated!! For me the issue seems to resolve when I pull an older commit of nvdla/sw (here) to insert module insmod opendla.ko and use the prebuilt ./nvdla_runtime

It has the good old opendla.ko in the prebuilt/linux/ directory. The later versions have opendla_1.ko and opendla_2.ko which seem to cause this issue. (Probably the int8 support messes something up!)

jinyl777 commented 4 years ago

i can not find opendla.ko ,and,when i run the ./compiler ,i got ./nvdla_compiler: line 2: syntax error: unexpected redirection

./nvdla_compiler: line 1: ELF�����: not found

can you help me

singhae commented 1 year ago

opendla.ko -> opendla_1.ko