Closed TreF555 closed 2 years ago
try lower the Detector MaxSize: all.Detector.MaxSize = 800; // default: 2048
you Moscow?
It shouldn't using ~40GB memory, can I check your Services.Image.dll
?
I am from Moscow. I can't submit the whole code for verification. I can tell you how it works in general: 1 input pdf file with many pages 2 in parallel I transfer each page to a temporary file 3 each temporary file is passed for recognition, then everything is expected and the result of processing for all files is displayed. This machine is 24 cores 50 gigabytes of memory If you specify specific pages, for example 3,4,5,6, then the container does not crash and works, the problem arises if you process all the pages from the source file, and there may be 100 and 200 pages or more. When working in parallel, maxDegreeOfParallelism = Environment.ProcessorCount / 2 is now set, i.e., in principle, the server is half loaded.
It seems you're not using MKLDNN
library(maybe fallbacked into openblas), which only consume 1 thread when applying the OCR. The server is unlikely half loaded if you using MKLDNN:
PaddleConfig.Defaults.UseMkldnn = true;
at very begging of your codeAfter that, you can specify maxDegreeOfParallelism to lower values.
The main reason of OutOfMemory is 1 OCR job consumes very large of memory.
Thanks for the answer. While decided to reduce the number of page processing in parallel. At the same time, we indicate no more than 5 pages, the container is still holding, it does not "fall". Regarding the question "1 OCR job consumes very large of memory" will something be fixed?
No, because it's consuming expected amount of memory, but you can lower the amount by specify all.Detector.MaxSize = 800
(default is 2048)
Or using openblas instead of mkldnn.
Good afternoon we use the following dockerfile
`FROM mcr.microsoft.com/dotnet/aspnet:6.0-focal as base
ENV DEBIAN_FRONTEND=noninteractive ENV OPENCV_VERSION=4.6.0
WORKDIR /
Install opencv dependencies
RUN apt-get update && apt-get -y install --no-install-recommends \ apt-transport-https \ software-properties-common \ wget \ unzip \ ca-certificates \ build-essential \ cmake \ git \ libtbb-dev \ libatlas-base-dev \ libgtk2.0-dev \ libavcodec-dev \ libavformat-dev \ libswscale-dev \ libdc1394-22-dev \ libxine2-dev \ libv4l-dev \ libtheora-dev \ libvorbis-dev \ libxvidcore-dev \ libopencore-amrnb-dev \ libopencore-amrwb-dev \ libavresample-dev \ x264 \
libgdiplus \ tesseract-ocr \ tesseract-ocr-rus \ imagemagick \ libtesseract-dev \ apt-utils \ && apt-get -y clean \ && rm -rf /var/lib/apt/lists/*
Setup opencv and opencv-contrib source
RUN wget https://github.com/opencv/opencv/archive/${OPENCV_VERSION}.zip && \ unzip ${OPENCV_VERSION}.zip && \ rm ${OPENCV_VERSION}.zip && \ mv opencv-${OPENCV_VERSION} opencv && \ wget https://github.com/opencv/opencv_contrib/archive/${OPENCV_VERSION}.zip && \ unzip ${OPENCV_VERSION}.zip && \ rm ${OPENCV_VERSION}.zip && \ mv opencv_contrib-${OPENCV_VERSION} opencv_contrib
Build OpenCV
RUN cd opencv && mkdir build && cd build && \ cmake \ -D OPENCV_EXTRA_MODULES_PATH=/opencv_contrib/modules \ -D CMAKE_BUILD_TYPE=RELEASE \ -D BUILD_SHARED_LIBS=OFF \ -D ENABLE_CXX11=ON \ -D BUILD_EXAMPLES=OFF \ -D BUILD_DOCS=OFF \ -D BUILD_PERF_TESTS=OFF \ -D BUILD_TESTS=OFF \ -D BUILD_JAVA=OFF \ -D BUILD_opencv_app=OFF \ -D BUILD_opencv_barcode=OFF \ -D BUILD_opencv_java_bindings_generator=OFF \ -D BUILD_opencv_js_bindings_generator=OFF \ -D BUILD_opencv_python_bindings_generator=OFF \ -D BUILD_opencv_python_tests=OFF \ -D BUILD_opencv_ts=OFF \ -D BUILD_opencv_js=OFF \ -D BUILD_opencv_bioinspired=OFF \ -D BUILD_opencv_ccalib=OFF \ -D BUILD_opencv_datasets=OFF \ -D BUILD_opencv_dnn_objdetect=OFF \ -D BUILD_opencv_dpm=OFF \ -D BUILD_opencv_fuzzy=OFF \ -D BUILD_opencv_gapi=OFF \ -D BUILD_opencv_intensity_transform=OFF \ -D BUILD_opencv_mcc=OFF \ -D BUILD_opencv_objc_bindings_generator=OFF \ -D BUILD_opencv_rapid=OFF \ -D BUILD_opencv_reg=OFF \ -D BUILD_opencv_stereo=OFF \ -D BUILD_opencv_structured_light=OFF \ -D BUILD_opencv_surface_matching=OFF \ -D BUILD_opencv_videostab=OFF \ -D BUILD_opencv_wechat_qrcode=ON \ -D WITH_GSTREAMER=OFF \ -D WITH_ADE=OFF \ -D OPENCV_ENABLE_NONFREE=ON \ .. && make -j$(nproc) && make install && ldconfig
Download OpenCvSharp
RUN git clone https://github.com/shimat/opencvsharp.git && cd opencvsharp
Install the Extern lib.
RUN mkdir /opencvsharp/make && cd /opencvsharp/make && \ cmake -D CMAKE_INSTALL_PREFIX=/opencvsharp/make /opencvsharp/src && \ make -j$(nproc) && make install && \ rm -rf /opencv && \ rm -rf /opencv_contrib && \ cp /opencvsharp/make/OpenCvSharpExtern/libOpenCvSharpExtern.so /usr/lib/
set noninteractive installation
RUN export DEBIAN_FRONTEND=noninteractive
install tzdata package
RUN apt-get install -y tzdata
set your timezone
RUN ln -fs /usr/share/zoneinfo/Europe/Moscow /etc/localtime RUN dpkg-reconfigure --frontend noninteractive tzdata RUN echo Europe/Moscow > /etc/timezone
RUN apk del tzdata
RUN wget -q https://paddle-inference-lib.bj.bcebos.com/2.3.2/cxx_c/Linux/CPU/gcc8.2_avx_mkl/paddle_inference_c.tgz && \ tar -xzf /paddle_inference_c.tgz && \ find /paddle_inference_c -mindepth 2 -name .so -print0 | xargs -0 -I {} mv {} /usr/lib && \ ls /usr/lib/.so && \ rm -rf /paddle_inference_c && \ rm paddle_inference_c.tgz
FROM mcr.microsoft.com/dotnet/sdk:6.0-focal AS build
WORKDIR /src COPY ["Services/Image/Services.Image/Services.Image.csproj", "Services/Image/Services.Image/"] COPY ["Core/Common/Common.csproj", "Core/Common/"] RUN dotnet restore "Services/Image/Services.Image/Services.Image.csproj" COPY . . WORKDIR "/src/Services/Image/Services.Image" RUN dotnet build "Services.Image.csproj" -c Release -o /app/build
FROM build AS publish RUN dotnet publish "Services.Image.csproj" -c Release -o /app/publish
FROM base AS final WORKDIR /app COPY --from=publish /app/publish .
WORKDIR /app/x64 RUN ln -s /usr/lib/x86_64-linux-gnu/libdl-2.31.so libdl.so RUN ln -s /usr/lib/x86_64-linux-gnu/liblept.so.5 liblept.so.5 RUN ln -s /usr/lib/x86_64-linux-gnu/liblept.so.5 libleptonica-1.80.0.so RUN ln -s /usr/lib/x86_64-linux-gnu/libtesseract.so.4.0.1 libtesseract41.so
WORKDIR /app ENTRYPOINT ["dotnet", "Services.Image.dll"]`
And we get a complete shutdown of the container when transferring a large file during text recognition. If you transfer a small file for processing, then everything works, the container does not "fall" There are errors in the container logs Error in boxClipToRectangle: box outside rectangle Error in pixScanForForeground: invalid box
1 how to reduce memory consumption while working? 2 how to prevent the "fall" of the container