Open hzhangxyz opened 1 year ago
@hzhangxyz awesome, thanks for the report. How did you install mpi? Your example uses mpi. Let's create a minimal reproducible example out of this, and then we can fix it.
I think LFortran doesn't work with mpi yet, as typically you would need to compile openmpi (let's say) with LFortran's support.
Ah I see --- it happens during compilation, not at runtime. I was confused with the error message. Ok, so then let's use bisection on this large file to narrow down the problem to the smallest possible file that works with gfortran, but fails with lfortran with this error. And then let's fix it.
I installed mpi via compiling openmpi-4.1.3 by myself, in gcc environment. I tried a mpi hello world, lfortran complain mpi.mod not found, without memory allocation failure.
It is hard to use bisection on this file, since functions depend on each other, do you know any methods to find the leaf functions on the dependent graph?
I try to compile openmpi with lfortran by
# set lfortran cc to gfortran to link libgfortran sometimes
LFORTRAN_CC=gfortran FC="lfortran --link-with-gcc" lfortran mod.f90 -c --link-with-gcc
but encounter the following problem:
configure:44735: checking external symbol convention
configure:44792: lfortran --link-with-gcc -c conftest.f -lz
configure:44799: $? = 0
Could not determine Fortran naming convention. Output from /usr/bin/nm -B:
configure:44838: result:
configure:44856: error: unknown naming convention:
When I try to compile a hello world level fortran module example(contains mod.f90 and main.f90), I find the functions in mod.f90 does not generate bin in mod.o(empty symbol table), which is in fact generated into main.o, this behavior confuses configure script of openmpi. So is it necessary to compile code late? I only see the similar behavior when compiling c++ module with template function.
Setting ompi_cv_fortran_external_symbol="no underscore"
manually solves the problem above, but the next problem occurs:
configure:44889: checking if C and Fortran are link compatible
configure:44937: gcc -c -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -mcx16 conftest_c.c
configure:44944: $? = 0
configure:44965: lfortran --link-with-gcc -o conftest conftest.f90 conftest_c.o -lz >&5
ESC[0;31;1msemantic errorESC[0;0mESC[0;1m: Function 'testfunc' not found or not implemented yet (if it is intrinsic)ESC[0;0m
ESC[0;34;1m-->ESC[0;0m conftest.f90:4:8
ESC[0;34;1m|ESC[0;0m
ESC[0;34;1m4 |ESC[0;0m call testfunc(1)
ESC[0;34;1m|ESC[0;0m ESC[0;31;1m^^^^^^^^^^^^^^^^ ESC[0;0m
ESC[0;1mNoteESC[0;0m: if any of the above error or warning messages are not clear or are lacking
context please report it to us (we consider that a bug that must be fixed).
configure:44965: $? = 1
configure: failed program was:
| program main
|
| external testfunc
| call testfunc(1)
|
| end
configure:44991: result: no
configure:45016: error: C and Fortran compilers are not link compatible. Can not continue.
Considering the previous behavior, it seems lfortran does not read symbol in .o file, and requiring all symbol could be found in .f90 and .mod?
You can force LFortran to generate code right away with --generate-object-code
. We use that for SciPy for example.
MPI implementations are notorious of using all kinds of undocumented / non-standard compiler behavior (for example they should not depend on what kind of naming convention we use internally in the compiler, they should use the standard "bind(c)" which we do support), so I am not surprised LFortran doesn't work out of the box. I am guessing the openmpi configure is passing all kinds of fancy Fortran code to probe LFortran and LFortran doesn't behave as it expects, so it breaks. We will eventually fix it, right now we are focusing on compiling simpler standalone 3rd party codes.
If you are interested to isolate the bugs that we need to fix so that LFortran works with openmpi, that would definitely be very helpful. That is most of the work. Fixing or adding some feature is usually easy, once we know exactly what to fix and how to test it with a small, standalone test.
One way to make progress on your file is to create a hand written mpi.f90
where you provide the interfaces that your code depends on (and make sure your file compiles with GFortran). If there are no other dependencies, then we can try to compile your code with LFortran, step by step. I went over your code and you don't seem to be using any fancy Fortran features, I think LFortran supports everything that you use. I am sure there will be bugs, but they should not be difficult to fix for us, so if you report them as isolated issues, we'll fix them quickly.
The way you do bisection is that you start removing the outer most function that nobody else calls. Then another one. And so on.
Well, I am not eager to compile my Tools.f90, and I want to solve openmpi issue one by one first. The next problem I face to is the confusing behavior when pass multiple file to command line of "lfortran". The details are following.
mod.f90
:
subroutine show_()
print*, "Hello"
end subroutine show_
main.f90
:
program hello
external show
call show
end program hello
Expect: I can compile multiple file at the same time like gfortran main.f90 mod.f90
.
Fact: Running LFORTRAN_CC=gfortran lfortran --link-with-gcc --generate-object-code mod.f90 main.f90
or LFORTRAN_CC=gfortran lfortran --link-with-gcc --generate-object-code main.f90 mod.f90
only recognize the first .f90 file.
main.f90
only.Expect: I can run LFORTRAN_CC=gfortran lfortran --link-with-gcc --generate-object-code mod.f90 -c
and LFORTRAN_CC=gfortran lfortran --link-with-gcc --generate-object-code main.f90 -c
individually successfully to generate mod.o
and main.o
, just like gfortran mod.f90 -c
and gfortran main.f90 -c
.
Fact: The second command complains semantic error: Function 'show' not found or not implemented yet (if it is intrinsic)
For the mod.f90
and main.f90
above, it is available to compile them with:
LFORTRAN_CC=gfortran lfortran --link-with-gcc --generate-object-code mod.f90 -c
LFORTRAN_CC=gfortran lfortran --link-with-gcc --generate-object-code mod.o main.f90
But the current symbol name in mod.o
is show_
which is show
for gfortran convention, only that could be used by main.f90
by external show
. When I change subroutine show_()
to subroutine show()
, main.f90
says it cannot find show_
.
Perfect, let's get MPI working first.
Excellent, thanks for the bug reports. I would recommend you use the latest "main" branch of LFortran, the release is quite old. We'll make a new release soon.
With the latest branch, first of all, name your function just "show", no underscores. Then everything works:
$ cat mod.f90
subroutine show()
print*, "Hello"
end subroutine show
$ cat main.f90
program hello
external show
call show
end program hello
$ lfortran --generate-object-code -c mod.f90
$ lfortran --generate-object-code -c main.f90 --implicit-interface
$ lfortran -o main main.o mod.o
$ ./main
Hello
Yes, compiling multiple files at once on the same command line is not supported yet, that would be done in the main driver src/bin/lfortran.cpp
.
Hello, thanks for your help and suggestions. I compiled lfortran from commit 74685b4d8be1e6f6985cafa939e62d5e9325ca61 just now. The current status is:
$ LFORTRAN_CC=gfortran ../local/bin/lfortran --generate-object-code --implicit-interface mod.f90 -c
$ LFORTRAN_CC=gfortran ../local/bin/lfortran --generate-object-code --implicit-interface main.f90 -c
$ LFORTRAN_CC=gfortran ../local/bin/lfortran --generate-object-code --implicit-interface main.o mod.o -o main
$ ./main
Hello
$ LFORTRAN_CC=gfortran ../local/bin/lfortran --generate-object-code --implicit-interface mod.f90 -c
$ LANG= LFORTRAN_CC=gfortran ../local/bin/lfortran --generate-object-code --implicit-interface main.f90 mod.o -o main
main.tmp.o: In function `main':
LFortran:(.text+0x7): undefined reference to `show'
collect2: error: ld returned 1 exit status
The command 'gfortran -o main main.tmp.o -L"/home/quaninfo/hzhangxyz/test/local/bin/../share/lfortran/lib" -Wl,-rpath,"/home/quaninfo/hzhangxyz/test/local/bin/../share/lfortran/lib" -llfortran_runtime -lm' failed.
me/quaninfo/hzhangxyz/test/local/bin/../share/lfortran/lib" -llfortran_runtime -lm' failed.
It seems it is needed to pass .o file to link command?
show_
again.This is not the way configure sript using, but this bahavior is strange.
$ LFORTRAN_CC=gfortran ../local/bin/lfortran --generate-object-code --implicit-interface mod.f90 -c
$ LANG= LFORTRAN_CC=gfortran ../local/bin/lfortran --generate-object-code --implicit-interface mod.o main.f90 -o main
/tmp/cc7PPEt2.o: In function `MAIN__':
main.f90:(.text+0xa): undefined reference to `show_'
collect2: error: ld returned 1 exit status
The command 'gfortran -o main mod.o main.f90 -L"/home/quaninfo/hzhangxyz/test/local/bin/../share/lfortran/lib" -Wl,-rpath,"/home/quaninfo/hzhangxyz/test/local/bin/../share/lfortran/lib" -llfortran_runtime -lm' failed.
I think the last failure is normal --- on the link line, you have to put the last dependency last. So main depends on "show", so you must put the "show" implementation last. If you switch it, you get an undefined symbol.
The second failure we need to fix. Can you please open up a dedicated issue for it?
I think the last failure is normal --- on the link line, you have to put the last dependency last. So main depends on "show", so you must put the "show" implementation last. If you switch it, you get an undefined symbol.
Yes, I agree what you said, but the problem is that it complains show_
not found but not show
not found, it seems something wrong here.
Ah I see. Then we need to investigate what is going on.
When I try to compile a single f90 file, lfortran cost too much memory, and then exit because of memory allocation failure. I retest on a larger machine(503G memory), it costs hundreds GB memoery and still keeps growing. Details are following.
install by conda.
Here is a snapshot of top
A file
Tools.f90
with only 5426 lines, which can be compiled withmpif90 -c Tools.f90
successfully.