mabarnes / moment_kinetics

Other
2 stars 4 forks source link

Trying to understand manual setup for new split package version of moment kinetics #177

Closed mrhardman closed 4 months ago

mrhardman commented 5 months ago

Using https://github.com/mabarnes/moment_kinetics/tree/geometry-upgrade-separate-packages with Julia 1.10 I am trying to understand the manual set up steps on a Linux machine to understand the machine setup scripts.

I fall down at the first hurdle https://mabarnes.github.io/moment_kinetics/previews/PR174/manual_setup/#Manual-setup

pkg> develop ./moment_kinetics/
ERROR: MethodError: no method matching get(::Pair{String,Any}, ::String, ::Nothing)
Closest candidates are:
  get(::Base.EnvDict, ::AbstractString, ::Any) at env.jl:77
  get(::REPL.Terminals.TTYTerminal, ::Any, ::Any) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/Terminals.jl:176
  get(::IdDict{K,V}, ::Any, ::Any) where {K, V} at abstractdict.jl:594

Is this a problem with Julia 1.1.0, or have I broken something locally in the repo?

johnomotani commented 5 months ago

Sorry, no idea what's happening there. Have you tried pointing julia at a fresh .julia directory? Packages left over from old versions might confuse things...

mrhardman commented 5 months ago

I already tried that, so I think a complete new git clone might be required.

mrhardman commented 5 months ago

Indeed, making a fresh new git clone of the repo seems to fix this, which seems to suggest that these install choices can be made once, and once only? So if I want to switch between working without post processing, to working with, I need to make a new repo?

mrhardman commented 5 months ago

Now we have this:

 pkg> develop ./moment_kinetics/
   Cloning default registries into `~/excalibur/moment_kinetics_newgeo/.julia`
   Cloning registry from "https://github.com/JuliaRegistries/General.git"
     Added registry `General` to `~/excalibur/moment_kinetics_newgeo/.julia/registries/General`
 Resolving package versions...
┌ Warning: julia version requirement for package moment_kinetics not satisfied
└ @ Pkg.Operations /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Pkg/src/Operations.jl:154
ERROR: Unsatisfiable requirements detected for package MPIPreferences [3da0fdf6]:
 MPIPreferences [3da0fdf6] log:
 ├─possible versions are: 0.1.0-0.1.10 or uninstalled
 ├─restricted to versions * by moment_kinetics [b5ff72cc], leaving only versions 0.1.0-0.1.10
 │ └─moment_kinetics [b5ff72cc] log:
 │   ├─possible versions are: 0.1.0 or uninstalled
 │   └─moment_kinetics [b5ff72cc] is fixed to version 0.1.0
 └─restricted by julia compatibility requirements to versions: uninstalled — no versions left
johnomotani commented 5 months ago

Indeed, making a fresh new git clone of the repo seems to fix this, which seems to suggest that these install choices can be made once, and once only? So if I want to switch between working without post processing, to working with, I need to make a new repo?

If you had an old Manifest.toml file in the directory, that could also be causing julia to try to use wrong/old versions of dependencies. You shouldn't ever need to make a new repo.

The options for setting up post-processing packages are described here: https://mabarnes.github.io/moment_kinetics/previews/PR174/developing/#Post-processing-packages

johnomotani commented 5 months ago

Not sure what's going on with your error. There is some explanation of this kind of error here https://discourse.julialang.org/t/a-guide-how-to-handle-error-unsatisfiable-requirements-detected/43406 but we don't pin any versions of MPIPreferences, so unless you have done something odd, that shouldn't be a problem.

In a fresh clone of the separate-postprocessing-packages-merge-geometry-upgrade, with an empty .julia, I can run

$ touch Project.toml
$ julia --project -O3
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.10.0 (2023-12-25)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> ]
(moment_kinetics-clean-test) pkg> dev ./moment_kinetics

and it completes successfully.

When you're in pkg> mode, you should see the name of the directory in brackets before the pkg> like I have above? If you have something like (@v1.10) pkg> you're in some global environment, which might have old packages installed.

mrhardman commented 5 months ago

Looking at these instructions https://mabarnes.github.io/moment_kinetics/previews/PR174/manual_setup/#Manual-setup

should there be a -- project in this call to julia if we have already emptied the Project.toml file by touch Project.toml?

$ julia julia> ] pkg> develop ./moment_kinetics/

mrhardman commented 5 months ago

More explicitly, after cloning the repo, I did this

hardman:~/excalibur/moment_kinetics_newgeo$ rm -rf .julia
hardman:~/excalibur/moment_kinetics_newgeo$ touch Project.toml
hardman:~/excalibur/moment_kinetics_newgeo$ julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.1.0 (2019-01-21)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(moment_kinetics_newgeo) pkg> develop ./moment_kinetics/
   Cloning default registries into `~/excalibur/moment_kinetics_newgeo/.julia`
   Cloning registry from "https://github.com/JuliaRegistries/General.git"
     Added registry `General` to `~/excalibur/moment_kinetics_newgeo/.julia/registries/General`
 Resolving package versions...
┌ Warning: julia version requirement for package moment_kinetics not satisfied
└ @ Pkg.Operations /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Pkg/src/Operations.jl:154
ERROR: Unsatisfiable requirements detected for package MPIPreferences [3da0fdf6]:
 MPIPreferences [3da0fdf6] log:
 ├─possible versions are: 0.1.0-0.1.10 or uninstalled
 ├─restricted to versions * by moment_kinetics [b5ff72cc], leaving only versions 0.1.0-0.1.10
 │ └─moment_kinetics [b5ff72cc] log:
 │   ├─possible versions are: 0.1.0 or uninstalled
 │   └─moment_kinetics [b5ff72cc] is fixed to version 0.1.0
 └─restricted by julia compatibility requirements to versions: uninstalled — no versions left

should the --project be there? why did we always using --project before if we should start dropping it now?

johnomotani commented 5 months ago

Sorry, yes absolutely the --project should be there.

mrhardman commented 5 months ago

Sorry, yes absolutely the --project should be there.

  • Ah, you've got julia version 1.1.0, not 1.10.0! That's probably the problem!

OK, that was a bad typo on my part.

Trying now with Julia 1.10.0 in https://github.com/mabarnes/moment_kinetics/tree/geometry-upgrade-separate-packages I find a problem trying to add Symbolics. Is this error message telling me that I have broken our source, or is something else going on? I am trying to work on this in my version of the merged branch because I have gone through by hand here in the manufactured solns modules already.

(moment_kinetics_newgeo) pkg> add Symbolics
...
Precompiling project...
  ✗ moment_kinetics → manufactured_solns_ext
  49 dependencies successfully precompiled in 55 seconds. 120 already precompiled.
  2 dependencies had output during precompilation:
┌ Symbolics → SymbolicsForwardDiffExt
│  ┌ Warning: Module SymbolicsForwardDiffExt with build ID ffffffff-ffff-ffff-0117-d79ae0d5d96a is missing from the cache.
│  │ This may mean SymbolicsForwardDiffExt [4a213a23-c09c-5cde-9712-b631ad2c72df] does not support precompilation but is imported by a module that does.
│  └ @ Base loading.jl:1942
│  ┌ Error: Error during loading of extension SymbolicsForwardDiffExt of Symbolics, use `Base.retry_load_extensions()` to retry.
│  │   exception =
│  │    1-element ExceptionStack:
│  │    Declaring __precompile__(false) is not allowed in files that are being precompiled.
│  │    Stacktrace:
│  │      [1] _require(pkg::Base.PkgId, env::Nothing)
│  │        @ Base ./loading.jl:1946
│  │      [2] __require_prelocked(uuidkey::Base.PkgId, env::Nothing)
│  │        @ Base ./loading.jl:1806
│  │      [3] #invoke_in_world#3
│  │        @ Base ./essentials.jl:921 [inlined]
│  │      [4] invoke_in_world
│  │        @ Base ./essentials.jl:918 [inlined]
│  │      [5] _require_prelocked
│  │        @ Base ./loading.jl:1797 [inlined]
│  │      [6] _require_prelocked
│  │        @ Base ./loading.jl:1796 [inlined]
│  │      [7] run_extension_callbacks(extid::Base.ExtensionId)
│  │        @ Base ./loading.jl:1289
│  │      [8] run_extension_callbacks(pkgid::Base.PkgId)
│  │        @ Base ./loading.jl:1324
│  │      [9] run_package_callbacks(modkey::Base.PkgId)
│  │        @ Base ./loading.jl:1158
│  │     [10] __require_prelocked(uuidkey::Base.PkgId, env::String)
│  │        @ Base ./loading.jl:1813
│  │     [11] #invoke_in_world#3
│  │        @ Base ./essentials.jl:921 [inlined]
│  │     [12] invoke_in_world
│  │        @ Base ./essentials.jl:918 [inlined]
│  │     [13] _require_prelocked(uuidkey::Base.PkgId, env::String)
│  │        @ Base ./loading.jl:1797
│  │     [14] macro expansion
│  │        @ Base ./loading.jl:1784 [inlined]
│  │     [15] macro expansion
│  │        @ Base ./lock.jl:267 [inlined]
│  │     [16] __require(into::Module, mod::Symbol)
│  │        @ Base ./loading.jl:1747
│  │     [17] #invoke_in_world#3
│  │        @ Base ./essentials.jl:921 [inlined]
│  │     [18] invoke_in_world
│  │        @ Base ./essentials.jl:918 [inlined]
│  │     [19] require(into::Module, mod::Symbol)
│  │        @ Base ./loading.jl:1740
│  │     [20] include
│  │        @ Base ./Base.jl:495 [inlined]
│  │     [21] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::String)
│  │        @ Base ./loading.jl:2216
│  │     [22] top-level scope
│  │        @ stdin:3
│  │     [23] eval
│  │        @ Core ./boot.jl:385 [inlined]
│  │     [24] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String)
│  │        @ Base ./loading.jl:2070
│  │     [25] include_string
│  │        @ Base ./loading.jl:2080 [inlined]
│  │     [26] exec_options(opts::Base.JLOptions)
│  │        @ Base ./client.jl:316
│  │     [27] _start()
│  │        @ Base ./client.jl:552
│  └ @ Base loading.jl:1295
└
┌ Symbolics → SymbolicsPreallocationToolsExt
│  ┌ Warning: Module SymbolicsPreallocationToolsExt with build ID ffffffff-ffff-ffff-0117-d7994e827fe2 is missing from the cache.
│  │ This may mean SymbolicsPreallocationToolsExt [d479e226-fb54-5ebe-a75e-a7af7f39127f] does not support precompilation but is imported by a module that does.
│  └ @ Base loading.jl:1942
│  ┌ Error: Error during loading of extension SymbolicsPreallocationToolsExt of Symbolics, use `Base.retry_load_extensions()` to retry.
│  │   exception =
│  │    1-element ExceptionStack:
│  │    Declaring __precompile__(false) is not allowed in files that are being precompiled.
│  │    Stacktrace:
│  │      [1] _require(pkg::Base.PkgId, env::Nothing)
│  │        @ Base ./loading.jl:1946
│  │      [2] __require_prelocked(uuidkey::Base.PkgId, env::Nothing)
│  │        @ Base ./loading.jl:1806
│  │      [3] #invoke_in_world#3
│  │        @ Base ./essentials.jl:921 [inlined]
│  │      [4] invoke_in_world
│  │        @ Base ./essentials.jl:918 [inlined]
│  │      [5] _require_prelocked
│  │        @ Base ./loading.jl:1797 [inlined]
│  │      [6] _require_prelocked
│  │        @ Base ./loading.jl:1796 [inlined]
│  │      [7] run_extension_callbacks(extid::Base.ExtensionId)
│  │        @ Base ./loading.jl:1289
│  │      [8] run_extension_callbacks(pkgid::Base.PkgId)
│  │        @ Base ./loading.jl:1324
│  │      [9] run_package_callbacks(modkey::Base.PkgId)
│  │        @ Base ./loading.jl:1158
│  │     [10] __require_prelocked(uuidkey::Base.PkgId, env::String)
│  │        @ Base ./loading.jl:1813
│  │     [11] #invoke_in_world#3
│  │        @ Base ./essentials.jl:921 [inlined]
│  │     [12] invoke_in_world
│  │        @ Base ./essentials.jl:918 [inlined]
│  │     [13] _require_prelocked(uuidkey::Base.PkgId, env::String)
│  │        @ Base ./loading.jl:1797
│  │     [14] macro expansion
│  │        @ Base ./loading.jl:1784 [inlined]
│  │     [15] macro expansion
│  │        @ Base ./lock.jl:267 [inlined]
│  │     [16] __require(into::Module, mod::Symbol)
│  │        @ Base ./loading.jl:1747
│  │     [17] #invoke_in_world#3
│  │        @ Base ./essentials.jl:921 [inlined]
│  │     [18] invoke_in_world
│  │        @ Base ./essentials.jl:918 [inlined]
│  │     [19] require(into::Module, mod::Symbol)
│  │        @ Base ./loading.jl:1740
│  │     [20] include
│  │        @ Base ./Base.jl:495 [inlined]
│  │     [21] include_package_for_output(pkg::Base.PkgId, input::String, depot_path::Vector{String}, dl_load_path::Vector{String}, load_path::Vector{String}, concrete_deps::Vector{Pair{Base.PkgId, UInt128}}, source::String)
│  │        @ Base ./loading.jl:2216
│  │     [22] top-level scope
│  │        @ stdin:3
│  │     [23] eval
│  │        @ Core ./boot.jl:385 [inlined]
│  │     [24] include_string(mapexpr::typeof(identity), mod::Module, code::String, filename::String)
│  │        @ Base ./loading.jl:2070
│  │     [25] include_string
│  │        @ Base ./loading.jl:2080 [inlined]
│  │     [26] exec_options(opts::Base.JLOptions)
│  │        @ Base ./client.jl:316
│  │     [27] _start()
│  │        @ Base ./client.jl:552
│  └ @ Base loading.jl:1295
└
johnomotani commented 5 months ago

I seem to see errors like this fairly often at the moment. The package seems to work anyway - I don't know what's going on, but currently hoping that Symbolics (or something?) will update and stop this happening without us having to worry about it.

johnomotani commented 5 months ago

PS on the branch in #178, the 1D-mirror_MMS_ngrid_9_nel_r_1_z_4_vpa_4_vperp_2_diss.toml example does at least run...

mrhardman commented 5 months ago

Indeed, the last error was a result of unreported errors in the compilation of the external manufactured solutions library, as fortunately indicated in a comment by @johnomotani.

mrhardman commented 5 months ago

PS on the branch in #178, the 1D-mirror_MMS_ngrid_9_nel_r_1_z_4_vpa_4_vperp_2_diss.toml example does at least run...

Ah yes, but did you catch all my annoying geometry changes ? ; )

I'll compare our branches after mine runs, I think I am almost there.

mrhardman commented 5 months ago

The manual startup instructions (https://mabarnes.github.io/moment_kinetics/previews/PR174/manual_setup/#Manual-setup) seem to have a few missing details. For my manual setup of https://github.com/mabarnes/moment_kinetics/tree/geometry-upgrade-separate-packages, I had to change a few details.

1) MPI was not automatically installed, therefore we need a

pkg > add MPI

2) Symbolics has to be installed by choice with

pkg> add Symbolics

3) A typo in the MPIPreferences discussion (missing using)

pkg> add MPIPreferences
pkg> <press 'backspace'>
julia> using MPIPreferences; MPIPreferences.use_system_binary()

4) No statements about setting the ENV["PYTHON"] variable and rebuilding PyCall with the correct system python.

$ which python
/your/python/location
$ julia -O3 --project
julia> ENV["PYTHON"]="/your/python/location"
julia> using Pkg; Pkg.build("PyCall")
mrhardman commented 5 months ago

PS on the branch in #178, the 1D-mirror_MMS_ngrid_9_nel_r_1_z_4_vpa_4_vperp_2_diss.toml example does at least run...

I am now able to confirm that the simulations run on my merge branch, but it seems like there are merge mistakes which make the MMS tests fail. Just trying to check if this is a problem with the number of cores used in the simulation.

I suspect that if there are problems it will come from the splitting of the manufactured solutions module, which I am not sure is worth the pain, since we will almost always want to have that module enabled for future higher dimensional development.

mrhardman commented 5 months ago

I have now fixed the merge branch https://github.com/mabarnes/moment_kinetics/tree/geometry-upgrade-separate-packages and made a PR to the other merge branch by @johnomotani for comparison. I have verified that my merge branch behaves properly on one core for both simulations and diagnostics, but I still have to understand what is wrong for the multi-core jobs. I suspect it could be down to the compilation problems with MPI, but TBC. I may look in the creation of the .so files to fix this.

EDIT: It seems that the problem with MPI is either intermittent, or my last changes somehow fixed it.

johnomotani commented 5 months ago

Something to watch out for - the diff on the github.com PRs (or the .../compare page) seems to compare the 'new' branch to the last common commit in the base branch, ignoring any newer changes on the base branch. In other words, it's not the same as the output of git diff. I don't know how to do a git diff on github.com, it's probably easier to just run it in your local repo. I did a git diff, and it looks like the differences between our merged branches are whitespace, print statements, and a couple of very small fixes/improvements that I made to machines/shared/machine_setup.jl

mrhardman commented 5 months ago

Should I do a local merge then and re-run my tests, and let you know how it goes?

johnomotani commented 5 months ago

Should I do a local merge then and re-run my tests, and let you know how it goes?

Sounds good.

mrhardman commented 5 months ago

P.S. Thanks for taking a look!

General potentially unrelated question: I am still running with this new branch without making an .so precompiled file. I seem to notice that for large core counts (>32), precompile time (time before "starting setup") tends to scale linearly with number of cores. Is this expected? Does the "precompile time" scale with core count if the .so is created in advance?

johnomotani commented 5 months ago

General potentially unrelated question: I am still running with this new branch without making an .so precompiled file. I seem to notice that for large core counts (>32), precompile time (time before "starting setup") tends to scale linearly with number of cores. Is this expected?

Not sure. With julia-1.10.0 it should be better than it was before. I wouldn't have expected the precompile/compile time to depend on number of cores, but I guess it's possible that e.g. the disk accesses compiling functions, etc. cause competition between the processes.

Does the "precompile time" scale with core count if the .so is created in advance?

No, at least not as far as I've noticed on ARCHER2 - using a .so, the time to start the time advance is well under a minute.