Closed zbeekman closed 2 years ago
@everythingfunctional thanks so much for the test drive and review!!! I'm impressed and happy that you got so far without any adequate documentation. This is my first priority and on the top of my todo list and I requirement before I change this PR from draft status.
I used a Windows Command Prompt. I first ran the
setvars.bat
script provided by Intel to setup the environment. I createdopencoarrays-build
andopencoarrays-install
folders. From theopencoarrays-build
folder, I usedcmake-gui ../OpenCoarrays
to set the Generator to "Unix Makefiles", the compilers to those installed by the Equation.com gcc installer, and the install folder toopencoarrays-install
. I was then able to, from the same command prompt, runmake
and themake test
to compile and run the test suite. A few tests failed, specifically register_alloc_comp_1,2 and 3, test-installation-scripts.sh, caf, and cafrun. It also popped up a lot of prompts from Windows Defender, but they did not reappear after having accepted them all and running the suite again. The failure of the last 3 tests is probably expected, since a Windows command prompt doesn't understand how to execute a shell script.make install
then copied the expected things over toopencoarrays-install
. From a git-bash terminal I can then use the providedcaf
andcafrun
scripts to compile and execute a "Hello, World", although each image segfaults on program shutdown. Usingfpm
withcaf
andcafrun
does not work however, as it usesexecute_command_line
, which executes in acmd
environment, which can't execute the scripts.
These are known issues, some/most of which are documented in the todo list in the main PR comment/description
register_alloc_comp_{1..3}
fail for me as well, and, do so on macOS as well: They fail with MPICH too, but not OpenMPI, IIRC. So this issue isn't limited to windows. This appears to be a bug in OpenCoarrays most likely, or MPICH based MPI implementations, or in my introspection build logic.caf
and cafrun
scripts are the only thing preventing the build & testing to be able to take place entirely within a native CMD or powershell environment AKAICT. I'm brainstorming ways to create thin wrappers to call them from a CMD.exe shell or similar. Likewise, Find_MPI isn't working from the bash shell, likely because the MPI compiler wrapper scripts provided by Intel are batch files and can't be executed from a bash shell in a straightforward way. I'm still contemplating ways to improve general useability and cut out or abstract and upstream complexity.In all, I think this is a huge leap forward, but I would ask for at least one additional thing to be included in this PR. Update the README.md and/or INSTALL.md to provide better instructions for building and using on Windows. A future PR for use of
caf
andcafrun
in a native Windows environment would be a good idea, but it's not super urgent.
I completely agree. Lack of documentation is the number one blocker of this PR and an absolute requirement before merging. That's why it's at the top of the ToDo list. 😉
Thanks again for the feedback!
I've pasted the segfault during shutdown info from a debug build below in the event that you or @vehre have any insight into what's happening.
$ cafrun -np 2 ./hello_caf.exe
Hello from 2 of 2 !
Hello from 1 of 2 !
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0xadf38fdb
#1 0xadf2ef04
#2 0xadf0fef1
#3 0x78797ff7
#4 0x78ab20ce
#5 0x78a61453
#6 0x78ab0bfd
#7 0xadf27185
#8 0xadee3471
#9 0xadee34e7
#10 0xadee1683
#11 0xadee13c0
#12 0xadee14f5
#13 0x784f7033
#14 0x78a62650
#15 0xffffffff
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0xadf38fdb
#1 0xadf2ef04
#2 0xadf0fef1
#3 0x78797ff7
#4 0x78ab20ce
#5 0x78a61453
#6 0x78ab0bfd
#7 0xadf27185
#8 0xadee3471
#9 0xadee34e7
#10 0xadee1683
#11 0xadee13c0
#12 0xadee14f5
#13 0x784f7033
#14 0x78a62650
#15 0xffffffff
1/2: _gfortran_caf_init(957) The mpi memory model is: unified (0x2, 1).
1/2: finalize_internal(972) (status_code = 0)
1/2: finalize_internal(1026) In barrier for finalize...1/2: finalize_internal(1059) Freed all slave tokens.
2/2: finalize_internal(972) (status_code = 0)
2/2: finalize_internal(1026) In barrier for finalize...2/2: finalize_internal(1059) Freed all slave tokens.
Error: Command:
`C:/Program Files (x86)/Intel/oneAPI/mpi/latest/bin/mpiexec.exe -np 2 ./hello_caf.exe`
failed to run.
@everythingfunctional @rouson let me know if the updated build instructions in INSTALL.md make sense and you can follow them. WSL's bash can cause problems, and I know the instructions are somewhat long and involved but I tried to have a good mix of hand-holding and robustly doing things on the users end; more advanced users will probably see shortcuts and alternative routes to get to the same endpoint.
Also, I've added batch-script wrappers for caf
and cafrun
. If the bin
subdirectory of the opencoarrays installation directory is on your path, you should be able to call caf
and cafrun
natively, and hopefully (untested 🤞) this means you might be able to pass them as the compiler to fpm.
@rouson @everythingfunctional Thanks so much for the reviews! Many great suggestions and improvements.
@rouson Please see this comment (1) and this comment (2) about "Git-Bash" vs "Git BASH" and "a Git-Bash bash shell" vs "a Git BASH shell" respectively. Right now I think the original wording/phrasing should stand unless I'm misunderstanding something.
@everythingfunctional I would like my wording/explanation to be clearer too, but think that the repetitive phrasing should stay, as explained in this comment unless there is a better way to say it. (But I can't think of one.)
@zbeekman all looks great. Thanks again!
Summary of changes
try_compile()
statement. This was definitely breaking windows builds and seems like it should have been breaking linux & macOS builds too, since thetry_compile()
doesn't cause an actual build system generation so the angle bracket syntax text of the generator expression was just getting injected directly into a makefile or compile command.FindMPI
CMake module--deprecated variables and options were removed in favor of their modern counterparts, but a few instances of the old variables may have been overlooked.caf
andcafrun
scripts were made more robust which in turn lets them be called directly from a bash or bash-like shell on Windows or a native Windows cmd.exe shell (with Git BASH installed). A long standing typo in thecaf
script was also fixed 😄caf
wrapper script will need to ensure that the compiler/linker enables a pthread compatible threading solution. This should make the build more robust on other platforms too.Rationale for changes
These changes were made to support the ability to build and run OpenCoarrays and therefore coarray Fortran applications on Windows and to generally improve the robustness of the build system. WSL and other heavyweight *nix compatibility solutions are often unpalatable or unacceptable to some users and organizations. Requiring the now freely available Intel MPI implementation (part of Intel's OneAPI set of developer tools) is acceptable because 1) It's the only sufficiently modern and up-to-date MPI solution that I am aware that runs natively on windows, 2) it is now available free of cost and 3) many HPC computing centers, government agencies, etc. already have support contracts with Intel and use the Intel compilers/OneAPI. In addition to Intel OneAPI, the GFortran and GCC native windows compilers are required along with CMake and unix make/gmake, and a bash shell like the one that comes with git-bash for windows. Binaries produced with the OpenCoarrays
caf
wrapper script should be redistributable without all these other requirements--only requiring the intel MPI runtime--assuming that the user doesn't intentionally try to create dynamically linked executables.ToDo
caf
wrapper scriptcaf
wrapper script encapsulates all the required intel MPI environment information to be used outside of the intel configured CMD.exe shell.Additional info and certifications
This pull request (PR) is a:
I certify that
Code coverage data