Open abcnorio opened 5 months ago
the '-DLink_static=true' does not exist
Seems like bestsource removed it 3 days ago here. Need to adjust.
to copy files target should be directory
That slash doesn't matter because file commands detect if it is a folder. It might be easier to read for humans though. If it does throw an error for you and crashes, how exactly does your build env look like?
that's unclear to me what is missing - is this related to 'libnvinfer_plugin.so.*'
Hm, that looks odd. I delete some libs here
https://github.com/styler00dollar/VSGAN-tensorrt-docker/blob/ac35e8dd92cfdcbc9db68e572527e86db0cf7cf3/Dockerfile#L740
because i just link the libs afterwards here
https://github.com/styler00dollar/VSGAN-tensorrt-docker/blob/ac35e8dd92cfdcbc9db68e572527e86db0cf7cf3/Dockerfile#L886
to save space since the files are the same. The files in /usr/local/tensorrt/lib/
should exist. The files get moved here.
https://github.com/styler00dollar/VSGAN-tensorrt-docker/blob/ac35e8dd92cfdcbc9db68e572527e86db0cf7cf3/Dockerfile#L503
It worked for me the when I built it around a week ago.
A warning can be added that parallel buildkit builds do not work. Could try that out on a 2x CPU xeon and it failed at several spots but had no time to find out how to prevent the failures.
The only reason for it to not work should be out of ram memory issues. With DOCKER_BUILDKIT=1 docker build -t styler00dollar/vsgan_tensorrt:latest .
it builds multiple stages at once. The Dockerfile was made to work with 64gb ram and thus can easily crash if not much ram is available, but I never tested with multiple cpus.
Thanks,
(1) build env:
64 GB RAM, AMD Ryzen 5 3600 6-Core Processor, Debian bullseye
ii docker 1.5-2 all transitional package ii docker-clean 2.0.4-3 all simple Shell script to clean up the Docker Daemon ii docker-compose 1.25.0-1 all Punctual, lightweight development environments using Docker ii docker-doc 20.10.5+dfsg1-1+deb11u2 all Linux container runtime -- documentation ii docker-registry 2.7.1+ds2-7+deb11u1 amd64 Docker toolset to pack, ship, store, and deliver content ii docker.io 20.10.5+dfsg1-1+deb11u2 amd64 Linux container runtime ii docker2aci 0.17.2+dfsg-2.1+b5 amd64 CLI tool to convert Docker images to ACIs ii python3-docker 4.1.0-1.2 all Python 3 wrapper to access docker.io's control socket ii python3-dockerpty 0.4.1-2 all Pseudo-tty handler for docker Python client (Python 3.x) ii wmdocker 1.5-2 amd64 System tray for KDE3/GNOME2 docklet applications
(2) Adding the slash made indeed a difference and the errors disappeared. Repeated that several times.
(3) Yes, saw that you deleted those files before adding symlinks later. How can I enter the build stage to inspect it manually via bash/ shell at that stage? Sorry, I am not very familiar with docker. Will re-do this part of the build tomorrow (my CPU is not that fast, so it takes quite some time) and send the exact output at time of break
(4) Regarding 2x CPU xeon -> just wanted to see whether the same errors occured but other errors popped up like
[tensorrt-ubuntu 9/20] RUN pip3 install /usr/local/tensorrt/python/tensorrt--cp311-.whl:
56 0.853 /bin/sh: 1: pip3: not found
executor failed running [/bin/sh -c pip3 install /usr/local/tensorrt/python/tensorrt--cp311-.whl]: exit code: 127
This could be prevented by just calling the reinstall of pip twice which looked to me like it tried to use pip before it was installed (therefor the idea the cause is the parallel building).
then next error:
[...] executor failed running [/bin/sh -c apt install fftw3-dev python-is-python3 pkg-config python3-pip git p7zip-full autoconf libtool yasm ffmsindex libffms2-5 libffms2-dev -y && git clone https://github.com/sekrit-twc/zimg --depth 1 --recurse-submodules --shallow-submodules && cd zimg && ./autogen.sh && CFLAGS=-fPIC CXXFLAGS=-fPIC ./configure --enable-static --disable-shared && make -j$(nproc) && checkinstall -y -pkgversion=0.0 && apt install /workspace/zimg/zimg_0.0-1_amd64.deb -y]: exit code: 100
btw - here the env is probably bullseye as well, I am not admin on the computer, just can use it. Have to find out how to use buldkit without parallel build, looks like
COMPOSE_PARALLEL_LIMIT=1 [...]
did not work out. Still looks like a parallel build. Parallel does not work, too many things depend on each other. As I cannot write to /etc/docker/... on that computer have to find out how to disable parallel build with buildkit which seems to work automatically if possible.
best + thanks.
the '-DLink_static=true' does not exist
Seems like bestsource removed it 3 days ago here. Need to adjust.
to copy files target should be directory
That slash doesn't matter because file commands detect if it is a folder. It might be easier to read for humans though. If it does throw an error for you and crashes, how exactly does your build env look like?
that's unclear to me what is missing - is this related to 'libnvinfer_plugin.so.*'
Hm, that looks odd. I delete some libs here https://github.com/styler00dollar/VSGAN-tensorrt-docker/blob/ac35e8dd92cfdcbc9db68e572527e86db0cf7cf3/Dockerfile#L740 because i just link the libs afterwards here https://github.com/styler00dollar/VSGAN-tensorrt-docker/blob/ac35e8dd92cfdcbc9db68e572527e86db0cf7cf3/Dockerfile#L886 to save space since the files are the same. The files in
/usr/local/tensorrt/lib/
should exist. The files get moved here. https://github.com/styler00dollar/VSGAN-tensorrt-docker/blob/ac35e8dd92cfdcbc9db68e572527e86db0cf7cf3/Dockerfile#L503 It worked for me the when I built it around a week ago.A warning can be added that parallel buildkit builds do not work. Could try that out on a 2x CPU xeon and it failed at several spots but had no time to find out how to prevent the failures.
The only reason for it to not work should be out of ram memory issues. With
DOCKER_BUILDKIT=1 docker build -t styler00dollar/vsgan_tensorrt:latest .
it builds multiple stages at once. The Dockerfile was made to work with 64gb ram and thus can easily crash if not much ram is available, but I never tested with mulitple cpus.-- Reply to this email directly or view it on GitHub: https://github.com/styler00dollar/VSGAN-tensorrt-docker/issues/69#issuecomment-2041582344 You are receiving this because you authored the thread.
Message ID: @.***>
Update:
(1) copy error
Step 229/246 : COPY --from=TensorRT-ubuntu /usr/local/tensorrt/lib/libnvinfer_plugin.so* /usr/local/tensorrt/lib/libnvinfer_vc_plugin.so* /usr/local/tensorrt/lib/libnvonnxparser.so* /usr/lib/x86_64-linux-gnu/
COPY failed: no source files were specified
replace
/usr/local/tensorrt/lib/
by
/usr/local/tensorrt/targets/x86_64-linux-gnu/lib/
and then it works - seems somehow 'docker build' does not follow the symlinks properly, as the first location is just a symlink for the second.
Did inspect the intermediate stage and all was built properly, so the symlink seemed to be the reason.
(2) multiple cpus
Docker is unfriendly if it comes to shut down parallelism which is enabled by default if you do not have buildx at hand. so we leave that out, but fact is the Dockerfile does not work with parallel building, because things depend on each other in sequence and it probably would require some rewrite to find out what can be built parallel and what not.
With buildx and buildkit.toml config file parallelism can be tweaked (and therefor shut down). Could not try it because of lack of admin rights on the server (would require a complete docker upgrade), but should work (in theory).
All in all, build went out fine along with the notions mentioned:
(3) further vps plugins
Will add some more plugins - if the build will work will send you a diff file with those addons.
best + thanks.
PS: for general information the disk usage:
REPOSITORY TAG IMAGE ID CREATED SIZE vsgantensorrt latest 076a4aa18920 5 minutes ago 13.2GB
Hij,
there were some errors in the Docker build:
(1)
-> the '-DLink_static=true' does not exist, should it be '-Ddefault_library=static' but using that it does not seem to build properly, because the *.so file is missing. For the time being just remove the static switch, but that's not a real solution, right?
(2)
-> error: to copy files target should be directory
(3)
-> error: to copy files target should be directory
(4)
-> error: COPY failed: no source files were specified -> that's unclear to me what is missing - is this related to 'libnvinfer_plugin.so.*' that were previously deleted? Around line 743:
But leaving this out does not resolve the problem.
(5)
-> error: to copy files target should be directory
(6)
A warning can be added that parallel buildkit builds do not work. Could try that out on a 2x CPU xeon and it failed at several spots but had no time to find out how to prevent the failures.
Please correct, esp. (1) and (4).
Thanks!f