lfortran / lfortran

Official main repository for LFortran
https://lfortran.org/
Other
937 stars 147 forks source link

Memory allocation failure when compiling a single f90 file. #2289

Open hzhangxyz opened 1 year ago

hzhangxyz commented 1 year ago

When I try to compile a single f90 file, lfortran cost too much memory, and then exit because of memory allocation failure. I retest on a larger machine(503G memory), it costs hundreds GB memoery and still keeps growing. Details are following.

$ cat /etc/os-release 
NAME="CentOS Stream"
VERSION="8"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Stream 8"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"

install by conda.

conda create -n lfortran
conda install -c conda-forge lfortran
$ conda list
# packages in environment at /home/hzhangxyz/conda/envs/lfortran:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
ca-certificates           2023.7.22            hbcca054_0    conda-forge
lfortran                  0.19.0               hfc55251_0    conda-forge
libgcc-ng                 13.1.0               he5830b7_0    conda-forge
libgomp                   13.1.0               he5830b7_0    conda-forge
libsodium                 1.0.18               h36c2ea0_1    conda-forge
libstdcxx-ng              13.1.0               hfd8a6a1_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
openssl                   3.1.2                hd590300_0    conda-forge
xeus                      3.0.5                hac2b420_1    conda-forge
xeus-zmq                  1.0.3                h0541b36_0    conda-forge
zeromq                    4.3.4                h9c3ff4c_1    conda-forge

$ lfortran --version
LFortran version: 0.19.0
Platform: Linux
Default target: x86_64-unknown-linux-gnu
$ lfortran -c Tools.f90 
runtime_error: malloc failed.

Here is a snapshot of top

$ top -n 1 p 1189514                                                                                                                                    
top - 21:32:09 up 102 days,  8:47,  4 users,  load average: 1.04, 0.84, 0.54                                                                                                                   
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie                                                                                                                           
%Cpu(s):  0.0 us,  6.2 sy,  0.0 ni, 93.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st                                                                                                                
MiB Mem : 514638.0 total, 259153.0 free, 251165.2 used,   4319.8 buff/cache                                                                                                                    
MiB Swap:   4768.0 total,   4681.3 free,     86.7 used. 259341.7 avail Mem                                                                                                                     

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                 
1189514 hzhangx+  20   0  512.1g 243.7g  24340 R 100.0  48.5   7:25.00 lfortran       

A file Tools.f90 with only 5426 lines, which can be compiled with mpif90 -c Tools.f90 successfully.

certik commented 1 year ago

@hzhangxyz awesome, thanks for the report. How did you install mpi? Your example uses mpi. Let's create a minimal reproducible example out of this, and then we can fix it.

I think LFortran doesn't work with mpi yet, as typically you would need to compile openmpi (let's say) with LFortran's support.

Ah I see --- it happens during compilation, not at runtime. I was confused with the error message. Ok, so then let's use bisection on this large file to narrow down the problem to the smallest possible file that works with gfortran, but fails with lfortran with this error. And then let's fix it.

hzhangxyz commented 1 year ago

I installed mpi via compiling openmpi-4.1.3 by myself, in gcc environment. I tried a mpi hello world, lfortran complain mpi.mod not found, without memory allocation failure.

It is hard to use bisection on this file, since functions depend on each other, do you know any methods to find the leaf functions on the dependent graph?

I try to compile openmpi with lfortran by

# set lfortran cc to gfortran to link libgfortran sometimes
LFORTRAN_CC=gfortran FC="lfortran --link-with-gcc" lfortran mod.f90 -c --link-with-gcc

but encounter the following problem:

configure:44735: checking  external symbol convention
configure:44792: lfortran --link-with-gcc  -c conftest.f   -lz
configure:44799: $? = 0
Could not determine Fortran naming convention. Output from /usr/bin/nm -B:
configure:44838: result: 
configure:44856: error: unknown naming convention: 

When I try to compile a hello world level fortran module example(contains mod.f90 and main.f90), I find the functions in mod.f90 does not generate bin in mod.o(empty symbol table), which is in fact generated into main.o, this behavior confuses configure script of openmpi. So is it necessary to compile code late? I only see the similar behavior when compiling c++ module with template function.

hzhangxyz commented 1 year ago

Setting ompi_cv_fortran_external_symbol="no underscore" manually solves the problem above, but the next problem occurs:

configure:44889: checking if C and Fortran are link compatible
configure:44937: gcc -c -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -mcx16  conftest_c.c
configure:44944: $? = 0
configure:44965: lfortran --link-with-gcc -o conftest    conftest.f90 conftest_c.o  -lz >&5
ESC[0;31;1msemantic errorESC[0;0mESC[0;1m: Function 'testfunc' not found or not implemented yet (if it is intrinsic)ESC[0;0m
 ESC[0;34;1m-->ESC[0;0m conftest.f90:4:8
  ESC[0;34;1m|ESC[0;0m
ESC[0;34;1m4 |ESC[0;0m        call testfunc(1)
  ESC[0;34;1m|ESC[0;0m        ESC[0;31;1m^^^^^^^^^^^^^^^^ ESC[0;0m

ESC[0;1mNoteESC[0;0m: if any of the above error or warning messages are not clear or are lacking
context please report it to us (we consider that a bug that must be fixed).
configure:44965: $? = 1
configure: failed program was:
|       program main
| 
|        external testfunc
|        call testfunc(1)
| 
|       end
configure:44991: result: no
configure:45016: error: C and Fortran compilers are not link compatible.  Can not continue.

Considering the previous behavior, it seems lfortran does not read symbol in .o file, and requiring all symbol could be found in .f90 and .mod?

certik commented 1 year ago

You can force LFortran to generate code right away with --generate-object-code. We use that for SciPy for example.

MPI implementations are notorious of using all kinds of undocumented / non-standard compiler behavior (for example they should not depend on what kind of naming convention we use internally in the compiler, they should use the standard "bind(c)" which we do support), so I am not surprised LFortran doesn't work out of the box. I am guessing the openmpi configure is passing all kinds of fancy Fortran code to probe LFortran and LFortran doesn't behave as it expects, so it breaks. We will eventually fix it, right now we are focusing on compiling simpler standalone 3rd party codes.

If you are interested to isolate the bugs that we need to fix so that LFortran works with openmpi, that would definitely be very helpful. That is most of the work. Fixing or adding some feature is usually easy, once we know exactly what to fix and how to test it with a small, standalone test.

One way to make progress on your file is to create a hand written mpi.f90 where you provide the interfaces that your code depends on (and make sure your file compiles with GFortran). If there are no other dependencies, then we can try to compile your code with LFortran, step by step. I went over your code and you don't seem to be using any fancy Fortran features, I think LFortran supports everything that you use. I am sure there will be bugs, but they should not be difficult to fix for us, so if you report them as isolated issues, we'll fix them quickly.

The way you do bisection is that you start removing the outer most function that nobody else calls. Then another one. And so on.

hzhangxyz commented 1 year ago

Well, I am not eager to compile my Tools.f90, and I want to solve openmpi issue one by one first. The next problem I face to is the confusing behavior when pass multiple file to command line of "lfortran". The details are following.

Files

mod.f90:

subroutine show_()
    print*, "Hello"
end subroutine show_

main.f90:

program hello
        external show
        call show
end program hello

Problems:

  1. Cannot compile multiple .f90 with single command line

Expect: I can compile multiple file at the same time like gfortran main.f90 mod.f90.

Fact: Running LFORTRAN_CC=gfortran lfortran --link-with-gcc --generate-object-code mod.f90 main.f90 or LFORTRAN_CC=gfortran lfortran --link-with-gcc --generate-object-code main.f90 mod.f90 only recognize the first .f90 file.

  1. Complain external subroutine not defined when compiling main.f90 only.

Expect: I can run LFORTRAN_CC=gfortran lfortran --link-with-gcc --generate-object-code mod.f90 -c and LFORTRAN_CC=gfortran lfortran --link-with-gcc --generate-object-code main.f90 -c individually successfully to generate mod.o and main.o, just like gfortran mod.f90 -c and gfortran main.f90 -c.

Fact: The second command complains semantic error: Function 'show' not found or not implemented yet (if it is intrinsic)

  1. External subroutine/function is compatible with gfortran name convention but not lfortran itself.

For the mod.f90 and main.f90 above, it is available to compile them with:

LFORTRAN_CC=gfortran lfortran --link-with-gcc --generate-object-code mod.f90 -c
LFORTRAN_CC=gfortran lfortran --link-with-gcc --generate-object-code mod.o main.f90

But the current symbol name in mod.o is show_ which is show for gfortran convention, only that could be used by main.f90 by external show. When I change subroutine show_() to subroutine show(), main.f90 says it cannot find show_.

certik commented 1 year ago

Perfect, let's get MPI working first.

Excellent, thanks for the bug reports. I would recommend you use the latest "main" branch of LFortran, the release is quite old. We'll make a new release soon.

With the latest branch, first of all, name your function just "show", no underscores. Then everything works:

$ cat mod.f90 
subroutine show()
    print*, "Hello"
end subroutine show
$ cat main.f90 
program hello
        external show
        call show
end program hello
$ lfortran --generate-object-code   -c mod.f90 
$ lfortran --generate-object-code   -c main.f90 --implicit-interface
$ lfortran -o main main.o mod.o
$ ./main 
Hello

Yes, compiling multiple files at once on the same command line is not supported yet, that would be done in the main driver src/bin/lfortran.cpp.

hzhangxyz commented 1 year ago

Hello, thanks for your help and suggestions. I compiled lfortran from commit 74685b4d8be1e6f6985cafa939e62d5e9325ca61 just now. The current status is:

  1. Your example works well
$ LFORTRAN_CC=gfortran ../local/bin/lfortran --generate-object-code --implicit-interface mod.f90 -c
$ LFORTRAN_CC=gfortran ../local/bin/lfortran --generate-object-code --implicit-interface main.f90 -c
$ LFORTRAN_CC=gfortran ../local/bin/lfortran --generate-object-code --implicit-interface main.o mod.o -o main
$ ./main 
Hello
  1. Cannot compile and link in a single command, which is the way how configure script of openmpi tested.
$ LFORTRAN_CC=gfortran ../local/bin/lfortran --generate-object-code --implicit-interface mod.f90 -c
$ LANG= LFORTRAN_CC=gfortran ../local/bin/lfortran --generate-object-code --implicit-interface main.f90 mod.o -o main
main.tmp.o: In function `main':
LFortran:(.text+0x7): undefined reference to `show'
collect2: error: ld returned 1 exit status
The command 'gfortran -o main main.tmp.o  -L"/home/quaninfo/hzhangxyz/test/local/bin/../share/lfortran/lib" -Wl,-rpath,"/home/quaninfo/hzhangxyz/test/local/bin/../share/lfortran/lib" -llfortran_runtime -lm' failed.
me/quaninfo/hzhangxyz/test/local/bin/../share/lfortran/lib" -llfortran_runtime -lm' failed.

It seems it is needed to pass .o file to link command?

  1. Exchange the order of files leads to undefined show_ again.

This is not the way configure sript using, but this bahavior is strange.

$ LFORTRAN_CC=gfortran ../local/bin/lfortran --generate-object-code --implicit-interface mod.f90 -c
$ LANG= LFORTRAN_CC=gfortran ../local/bin/lfortran --generate-object-code --implicit-interface mod.o main.f90 -o main
/tmp/cc7PPEt2.o: In function `MAIN__':
main.f90:(.text+0xa): undefined reference to `show_'
collect2: error: ld returned 1 exit status
The command 'gfortran -o main mod.o main.f90  -L"/home/quaninfo/hzhangxyz/test/local/bin/../share/lfortran/lib" -Wl,-rpath,"/home/quaninfo/hzhangxyz/test/local/bin/../share/lfortran/lib" -llfortran_runtime -lm' failed.
certik commented 1 year ago

I think the last failure is normal --- on the link line, you have to put the last dependency last. So main depends on "show", so you must put the "show" implementation last. If you switch it, you get an undefined symbol.

The second failure we need to fix. Can you please open up a dedicated issue for it?

hzhangxyz commented 1 year ago

I think the last failure is normal --- on the link line, you have to put the last dependency last. So main depends on "show", so you must put the "show" implementation last. If you switch it, you get an undefined symbol.

Yes, I agree what you said, but the problem is that it complains show_ not found but not show not found, it seems something wrong here.

certik commented 1 year ago

Ah I see. Then we need to investigate what is going on.