llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.42k stars 11.74k forks source link

[flang] flang-new command terminated abnormally.(flang-new: error: unable to execute command: Killed) #60376

Closed ohno-fj closed 1 year ago

ohno-fj commented 1 year ago
Version of flang-new : 16.0.0(77f2f34d696b77fe5bf05afbe7386966b6bcc8ba)

The flang-new command terminated abnormally about 4 minutes after it started. When I checked with the top command, %MEM gradually increased to 100%. It seems that a lot of memory was used.

There was no problem with gfortran compiling on the same machine. gfortran finished compilation in about 5 seconds.

The following are the test program, flang-new and gfortran compilation result, options for the cmake command when building flang on AArch64 machine.

$ time flang-new -flang-experimental-exec snggj430_2.f90
flang-new: error: unable to execute command: Killed
flang-new: error: flang frontend command failed due to signal (use -v to see invocation)
flang-new version 16.0.0 (https://github.com/llvm/llvm-project.git 77f2f34d696b77fe5bf05afbe7386966b6bcc8ba)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /home/users/ea01/ea0178/LLVM_20230116/release/bin
flang-new: note: diagnostic msg:
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
flang-new: note: diagnostic msg: /tmp/snggj430_2-dfb524
flang-new: note: diagnostic msg: /tmp/snggj430_2-dfb524.sh
flang-new: note: diagnostic msg:

********************

real    4m23.148s
user    4m6.627s
sys     0m6.320s
$
$ cat /tmp/snggj430_2-dfb524
#line "./snggj430_2.f90" 1
      program main
      integer ::y(1000,1000),yy(:,:),k=1
      allocatable yy
      allocate(yy(2,2))
      y=reshape((/(j,j=1,1000*1000)/),(/1000,1000/))
      yy=y(1:1000:500,1:1000:500)
      call sub(y(1:1000:500,1:1000:500))
      print *,'OK'
      contains
      subroutine sub(x)
      integer x(1:,1:)
      if(size(x)/=2*2)write(6,*) "NG"
      do i2=1,2
       do i1=1,2
          if (x(i1,i2)/=yy(i1,i2))write(6,*) "NG"
       end do
      end do
      end subroutine sub
      end program main
$
$ cat /tmp/snggj430_2-dfb524.sh
# Crash reproducer for clang version 16.0.0 (https://github.com/llvm/llvm-project.git 77f2f34d696b77fe5bf05afbe7386966b6bcc8ba)
# Driver args: "-flang-experimental-exec" "snggj430_2.f90"
# Original command:  "/home/users/ea01/ea0178/LLVM_20230116/release/bin/flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" "-emit-obj" "-mrelocation-model" "pic" "-pic-level" "2" "-pic-is-pie" "-target-cpu" "generic" "-target-feature" "+neon" "-target-feature" "+v8a" "-o" "/tmp/snggj430_2-4fb69c.o" "-x" "f95-cpp-input" "snggj430_2.f90"
 "/home/users/ea01/ea0178/LLVM_20230116/release/bin/flang-new" "-fc1" "-triple" "aarch64-unknown-linux-gnu" "-emit-obj" "-mrelocation-model" "pic" "-pic-level" "2" "-pic-is-pie" "-target-cpu" "generic" "-target-feature" "+neon" "-target-feature" "+v8a" "-x" "f95-cpp-input" "snggj430_2-dfb524"
$
$ time gfortran snggj430_2.f90

real    0m4.753s
user    0m3.891s
sys     0m0.064s
$

options for the cmake command when building flang on AArch64 machine

    cmake \
      -S ../llvm -B ../build \
      -G Ninja ../llvm \
      -DLLVM_PARALLEL_COMPILE_JOBS=9 \
      -DLLVM_PARALLEL_LINK_JOBS=3 \
      -DCMAKE_BUILD_TYPE=Release \
      -DFLANG_ENABLE_WERROR=On \
      -DLLVM_ENABLE_ASSERTIONS=ON \
      -DLLVM_TARGETS_TO_BUILD=host \
      -DCMAKE_INSTALL_PREFIX=/home/users/ea01/ea0178/LLVM_20230116/release \
      -DLLVM_LIT_ARGS=-v \
      -DLLVM_ENABLE_PROJECTS="mlir;flang;clang;openmp" \
      -DLLVM_ENABLE_RUNTIMES="compiler-rt;libcxx;libcxxabi"
luporl commented 1 year ago

I was able to compile this source with flang-new, but it took 2m49s and used over 60GB of RAM. Disabling optimizations and only emitting bitcode didn't help.

luporl commented 1 year ago

This is the stack trace when flang-new starts using over 10GB:

 #0 0x0000aaaabd5f8eb8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/leandro.lupori/git/llvm-project/buildr/bin/flang-new+0x3152eb8)
 #1 0x0000aaaabd5f7000 llvm::sys::RunSignalHandlers() (/home/leandro.lupori/git/llvm-project/buildr/bin/flang-new+0x3151000)
 #2 0x0000aaaabd5f719c SignalHandler(int) Signals.cpp:0:0
 #3 0x0000ffff930785c0 (linux-vdso.so.1+0x5c0)
 #4 0x0000aaaabeb95864 std::pair<mlir::NamedAttribute const*, bool> mlir::impl::findAttrSorted<mlir::NamedAttribute const*>(mlir::NamedAttribute const*, mlir::NamedAttribute const*, mlir::StringAttr) (/home/leandro.lupori/git/llvm-project/buildr/bin/flang-new+0x46ef864)
 #5 0x0000aaaabf436b3c mlir::LLVM::InsertValueOp::getPositionAttr() (/home/leandro.lupori/git/llvm-project/buildr/bin/flang-new+0x4f90b3c)
 #6 0x0000aaaabf436bb4 mlir::LLVM::InsertValueOp::getPosition() (/home/leandro.lupori/git/llvm-project/buildr/bin/flang-new+0x4f90bb4)
 #7 0x0000aaaabef329a8 convertOperationImpl(mlir::Operation&, llvm::IRBuilderBase&, mlir::LLVM::ModuleTranslation&) LLVMToLLVMIRTranslation.cpp:0:0
 #8 0x0000aaaabf08e338 mlir::LLVM::ModuleTranslation::convertOperation(mlir::Operation&, llvm::IRBuilderBase&) (/home/leandro.lupori/git/llvm-project/buildr/bin/flang-new+0x4be8338)
 #9 0x0000aaaabf0963f4 mlir::LLVM::ModuleTranslation::convertGlobals() (/home/leandro.lupori/git/llvm-project/buildr/bin/flang-new+0x4bf03f4)
#10 0x0000aaaabf09c3f4 mlir::translateModuleToLLVMIR(mlir::Operation*, llvm::LLVMContext&, llvm::StringRef) (/home/leandro.lupori/git/llvm-project/buildr/bin/flang-new+0x4bf63f4)
#11 0x0000aaaabd62eee8 Fortran::frontend::CodeGenAction::generateLLVMIR() (/home/leandro.lupori/git/llvm-project/buildr/bin/flang-new+0x3188ee8)
#12 0x0000aaaabd6edfac Fortran::frontend::CodeGenAction::executeAction() (/home/leandro.lupori/git/llvm-project/buildr/bin/flang-new+0x3247fac)
#13 0x0000aaaabd61e67c Fortran::frontend::FrontendAction::execute() (/home/leandro.lupori/git/llvm-project/buildr/bin/flang-new+0x317867c)
#14 0x0000aaaabd610d90 Fortran::frontend::CompilerInstance::executeAction(Fortran::frontend::FrontendAction&) (/home/leandro.lupori/git/llvm-project/buildr/bin/flang-new+0x316ad90)
#15 0x0000aaaabd623374 Fortran::frontend::executeCompilerInvocation(Fortran::frontend::CompilerInstance*) (/home/leandro.lupori/git/llvm-project/buildr/bin/flang-new+0x317d374)
#16 0x0000aaaabd25a23c fc1_main(llvm::ArrayRef<char const*>, char const*) (/home/leandro.lupori/git/llvm-project/buildr/bin/flang-new+0x2db423c)
#17 0x0000aaaabd259df0 main (/home/leandro.lupori/git/llvm-project/buildr/bin/flang-new+0x2db3df0)
#18 0x0000ffff92b92e10 __libc_start_main /build/glibc-RIFKjK/glibc-2.31/csu/../csu/libc-start.c:342:3
#19 0x0000aaaabd2575bc _start (/home/leandro.lupori/git/llvm-project/buildr/bin/flang-new+0x2db15bc)
jeanPerier commented 1 year ago

The problem is flang lowering inability to deal with huge constants with heterogenous data (reshape((/(j,j=1,1000*1000)/),(/1000,1000/))).

The front-end rewrites reshape((/(j,j=1,1000*1000)/),(/1000,1000/)) as a Constant rank 2 array with a million element.

Lowering places Constant in global read only data. It currently emits the initializer for the non-rank one arrays as a series of insert in the MLIR initializer region. MLIR code dealing with the folding of LLVM dialect insert into an LLVM IR constant initializer seems somehow quadratic on the number of insert.

The easiest fix, if possible, could be to emit an MLIR attribute for non rank-1 arrays: https://github.com/llvm/llvm-project/blob/2d9b4a50cae8d18516a61977768a48d1f92ac33c/flang/lib/Lower/ConvertConstant.cpp#L127

We should also probably add a hard TODO for the cases that will not be possible to express as attributes and whose size is bigger than a certain threshold to be determined to avoid wasting compilation time to crash with a not so clear error without any pointer to the badly supported feature.

llvmbot commented 1 year ago

@llvm/issue-subscribers-flang-ir

luporl commented 1 year ago

Thanks for the detailed explanation @jeanPerier ! Following your suggestion, I was able to prepare this patch: https://reviews.llvm.org/D150686

It works fine with a million elements, compiling the source in less than 2 seconds. However, with an array of 100 million elements, the compile time was of 3m12s and memory usage peak was of about 9GB. The resulting binary had a size of 384MB. For comparison, gfortran took 2m01s to compile the same source, used less than 1GB of RAM and produced a binary of 18KB.

Interestingly, if a non-const value is used in the array constructor's implied do loop, the compile time, memory usage and binary size drop a lot. It seems it's because a runtime loop is used to build the array in this case. So maybe in a future patch, if the need to support such large arrays arise, we could build huge array constants as if they were not constant.

jeanPerier commented 1 year ago

Interestingly, if a non-const value is used in the array constructor's implied do loop, the compile time, memory usage and binary size drop a lot. It seems it's because a runtime loop is used to build the array in this case. So maybe in a future patch, if the need to support such large arrays arise, we could build huge array constants as if they were not constant.

Thanks for comparing this. Yes, it makes some sense to me. Right now the issue is that folding in semantics is done "greedily" as if it always was in a constant expression contexts where it needs to come up with a constant.

Maybe an optional threshold option could be added to the folding context to tell it that it should not attempt to unroll array constructor loops bigger than that threshold. This option would be set when analyzing an expression that does not appear inside a context where it must be a constant.