Open yuwenchen95 opened 1 year ago
@amontoison could you advise?
Hi @yuwenchen95! If you want to want use SSIDS with GPU support, we must recompile it with CUDA. I cross-compiled SPRAL for Ipopt and GALAHAD and they don't use the GPU version of SSIDS.
For using SPRAL in Julia, I started a SPRAL.jl package with all wrappers generated by Clang.jl. It's a private repository but I can convert it into a public one. https://github.com/JSO-Experimental/Spral.jl
Hi @yuwenchen95! If you want to want use SSIDS with GPU support, we must recompile it with CUDA. I cross-compiled SPRAL for Ipopt and GALAHAD and they don't use the GPU version of SSIDS.
For using SPRAL in Julia, I started a SPRAL.jl package with all wrappers generated by Clang.jl. It's a private repository but I can convert it into a public one. https://github.com/JSO-Experimental/Spral.jl
Hi @amontoison Thank you so much to make the repo public.
I tried to use the wrapper in your repo
using Spral
options = Spral.spral_ssids_options(Cint(0), Cint(0), Cint(6),Cint(6),Cint(6),
Cint(1),
Cint(8),
Cint(true),
Cint(true),
Clong(5e7),
Cfloat(1.2),
Cfloat(1.0),
Cint(0),
Clong(4e6),
Cint(256),
Cint(true),
Cint(1),
Cdouble(1e-20),
Cdouble(0.01),
ntuple(i -> UInt8(0),80)
)
ptr_options = Ptr{Spral.spral_ssids_options}(pointer_from_objref(options))
inform = Spral.spral_ssids_inform(
Cint(0),
Cint(0),
Cint(0),
Cint(0),
Cint(0),
Cint(0),
Cint(0),
Cint(0),
Clong(0),
Clong(0),
Cint(0),
Cint(0),
Cint(0),
Cint(0),
Cint(0),
Cint(0),
ntuple(i -> UInt8(0),80)
)
# ptr_inform = Ptr{Spral.spral_ssids_inform}()
ptr_inform = Ptr{Spral.spral_ssids_inform}(pointer_from_objref(inform))
Spral.spral_ssids_default_options(ptr_options)
check = Cint(true)
posdef = Cint(false)
n = Cint(5);
ptr = Clong.([1, 3, 6, 8, 9, 10 ]);
ptr = Ptr{Clong}(pointer_from_objref(ptr))
row = Cint.([1, 2, 2, 3, 5, 3, 4, 4, 5 ]);
row = Ptr{Cint}(pointer_from_objref(row))
val = Cdouble.([2.0, 1.0, 4.0, 1.0, 1.0, 3.0, 2.0, -1.0, 2.0]);
val = Ptr{Cdouble}(pointer_from_objref(val))
akeep = Ptr{Ptr{Cvoid}}(C_NULL)
fkeep = Ptr{Ptr{Cvoid}}(C_NULL)
order = Ptr{Cint}(C_NULL)
Spral.spral_ssids_analyse(check, n, order, ptr, row, Ptr{Cdouble}(C_NULL), akeep, ptr_options, ptr_inform)
but still get the error
Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x5e785e89 -- spral_ssids_analyse at C:\Users\ddt00\.julia\artifacts\1b57748e686ea33dca48835d73f5c7cdd13607ac\bin\libspral.dll (unknown line)
in expression starting at REPL[9]:1
spral_ssids_analyse at C:\Users\ddt00\.julia\artifacts\1b57748e686ea33dca48835d73f5c7cdd13607ac\bin\libspral.dll (unknown line)
spral_ssids_analyse at C:\Users\ddt00\.julia\dev\Spral\src\wrapper\spral_api.jl:159
unknown function (ip: 000000005d7f713c)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
do_call at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:126
eval_value at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:215
eval_stmt_value at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:166 [inlined]
eval_body at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:594
jl_interpret_toplevel_thunk at /cygdrive/c/buildbot/worker/package_win64/build/src\interpreter.c:750
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:906
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:850
ijl_toplevel_eval at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:915 [inlined]
ijl_toplevel_eval_in at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:965
eval at .\boot.jl:368 [inlined]
eval_user_input at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.8\REPL\src\REPL.jl:151
repl_backend_loop at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.8\REPL\src\REPL.jl:247
start_repl_backend at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.8\REPL\src\REPL.jl:232
#run_repl#47 at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.8\REPL\src\REPL.jl:369
run_repl at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.8\REPL\src\REPL.jl:355
jfptr_run_repl_67836.clone_1 at C:\Users\ddt00\AppData\Local\Programs\julia-1.8.1\lib\julia\sys.dll (unknown line)
#967 at .\client.jl:419
jfptr_YY.967_41697.clone_1 at C:\Users\ddt00\AppData\Local\Programs\julia-1.8.1\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
jl_f__call_latest at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:774
#invokelatest#2 at .\essentials.jl:729 [inlined]
invokelatest at .\essentials.jl:726 [inlined]
run_main_repl at .\client.jl:404
exec_options at .\client.jl:318
_start at .\client.jl:522
jfptr__start_40826.clone_1 at C:\Users\ddt00\AppData\Local\Programs\julia-1.8.1\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1838 [inlined]
true_main at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:575
jl_repl_entrypoint at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:719
mainCRTStartup at /cygdrive/c/buildbot/worker/package_win64/build/cli\loader_exe.c:59
BaseThreadInitThunk at C:\WINDOWS\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
Allocations: 785732 (Pool: 785318; Big: 414); GC: 1
It seems that I may have some type errors for input parameters of the function spral_ssids_analyse
but can't figure it. I checked all datatype and all my input are of the required types in the wrapper.
@yuwenchen95
I will have a look tonight.
We should not use pointer_from_objref
. I just need to update the wrappers to replace Ptr{...}
by Ref{...}
for the C structures.
I updated Spral.jl. For your example, I fixed a few things:
using Spral
options = Spral.spral_ssids_options()
inform = Spral.spral_ssids_inform()
Spral.spral_ssids_default_options(options)
check = Cint(true)
posdef = Cint(false)
n = 5
ptr = [1, 3, 6, 8, 9, 10]
row = [1, 2, 2, 3, 5, 3, 4, 4, 5]
val = [2.0, 1.0, 4.0, 1.0, 1.0, 3.0, 2.0, -1.0, 2.0]
akeep = [Ptr{Ptr{Cvoid}}()]
fkeep = [Ptr{Ptr{Cvoid}}()]
order = Cint[]
Spral.spral_ssids_analyse(check, n, order, ptr, row, Cdouble[], akeep, options, inform)
I updated Spral.jl. For your example, I fixed a few things:
using Spral options = Spral.spral_ssids_options() inform = Spral.spral_ssids_inform() Spral.spral_ssids_default_options(options) check = Cint(true) posdef = Cint(false) n = 5 ptr = [1, 3, 6, 8, 9, 10] row = [1, 2, 2, 3, 5, 3, 4, 4, 5] val = [2.0, 1.0, 4.0, 1.0, 1.0, 3.0, 2.0, -1.0, 2.0] akeep = [Ptr{Ptr{Cvoid}}()] fkeep = [Ptr{Ptr{Cvoid}}()] order = Cint[] Spral.spral_ssids_analyse(check, n, order, ptr, row, Cdouble[], akeep, options, inform)
Thanks for the update!
Now I can get the error information as in the native C, where it returns the error
Error return from ssids_analyse. Error flag = -4
All entries in a column out-of-range (ssids_analyse) or all entries out-of-range (ssids_analyse_coord)
I'm very confused since I am using the example from https://www.numerical.rl.ac.uk/spral/doc/v2016.09.23/C/ssids.html#c.spral_ssids_analyse. Did you happen to have this issue before?
Do you have the same error if you run the C example directly. @jfowkes Is the documentation still relevant after 7 years?
@amontoison I have no idea as I did not write the documentation.
The example in the documentation is just a reference to the actual example file, so it should always be the relevant, up-to-date version: https://github.com/ralna/spral/blob/58f8a4aca41f427df0b56f19d5c21a2121c4f610/doc/C/ssids.rst?plain=1#L764 However, it looks like it does not automatically update on the website, since it does not have the fix https://github.com/ralna/spral/commit/21c41f73. That's what causes the out-of-range error when using that code there. Other changes like https://github.com/ralna/spral/commit/40e6c78f are also not there. Is there some script that has to be run to refresh the documentation for the website? Or did some script break that is supposed to update the documentation on push?
Well spotted @mjacobse, I think the issue is that the documentation is HTML that is built using Sphinx from the *.rst sources and it has not been rebuilt for 7 years. Part of the issue is that the Fortran docs use the poorly maintained Sphinx extension https://github.com/VACUMM/sphinx-fortran so we are not even sure if the docs will build without error.
I will have a go at rebuilding the Sphinx documentation and seeing if it is still possible to build it at all.
@yuwenchen95 @mjacobse I have rebuilt updated documentation for v2023.08.02, please see here:
https://www.numerical.rl.ac.uk/spral/doc/
This includes the updated examples/C/ssids.c
which should now be running, apologies.
Thanks for the updated documentation @jfowkes @mjacobse , it works for spral_ssids_analyse
now.
@amontoison Shall we also import data struct akeep, fkeep
for the implementation? Btw, we requires both access to **akeep
and *akeep
in the implementation, as shown in examples/C/ssids.c
. How can we achieve this in the Julia implementation?
In Julia, you just need to define akeep
and fkeep
like this:
akeep = [Ptr{Ptr{Cvoid}}()]
fkeep = [Ptr{Ptr{Cvoid}}()]
The pointers to the Fortran structures (akeep
and fkeep
) are updated when we call the C routines. We need to store them such that we can deallocate them at the end.
We could create a new structure that is more friendly for the users like:
mutable struct SsidsSolver
akeep::...
fkeep::...
control::...
inform::...
end
and add high-level Julia functions that call the wrappers under the hood. It's what we do for CUDA.jl or HSL.jl.
In Julia, you just need to define
akeep
andfkeep
like this:akeep = [Ptr{Ptr{Cvoid}}()] fkeep = [Ptr{Ptr{Cvoid}}()]
The pointers to the Fortran structures (
akeep
andfkeep
) are updated when we call the C routines. We need to store them such that we can deallocate them at the end.We could create a new structure that is more friendly for the users like:
mutable struct SsidsSolver akeep::... fkeep::... control::... inform::... end
and add high-level Julia functions that call the wrappers under the hood. It's what we do for CUDA.jl or HSL.jl.
I asked it because when I run
sing Spral
options = Spral.spral_ssids_options()
inform = Spral.spral_ssids_inform()
Spral.spral_ssids_default_options(options)
options.array_base = 1
check = Cint(true)
posdef = Cint(false)
n = 5
ptr = [1, 3, 6, 8, 9, 10]
row = [1, 2, 2, 3, 5, 3, 4, 4, 5]
val = [2.0, 1.0, 4.0, 1.0, 1.0, 3.0, 2.0, -1.0, 2.0]
ptr = [1, 2, 3, 4, 5, 6]
# ptr .-= 1
row = Cint.([1, 2, 3, 4, 5])
# row .-= 1
val = [1.0, 2.0, 3.0, 4.0, 5.0]
akeep = [Ptr{Ptr{Cvoid}}()]
fkeep = [Ptr{Ptr{Cvoid}}()]
order = Cint[]
Spral.spral_ssids_analyse(check, n, order, ptr, row, Cdouble[], akeep, options, inform)
if(inform.flag<0)
Spral.spral_ssids_free(akeep, fkeep);
exit(1);
end
Spral.spral_ssids_factor(posdef, Int64[], Cint[], val, Cdouble[], akeep, fkeep, options,
inform);
it returns an error that
Error return from ssids_factor. Error flag = -53
SSIDS CPU code requires OMP cancellation to be enabled
I'm not so sure whether my use of akeep
is correct in the function spral_ssids_factor(posdef, Int64[], Cint[], val, Cdouble[], akeep, fkeep, options, inform);
since we need to deref the akeep
for this function.
Menwhile, I'm confused by the returned error above as it looks like a CPU related error while we have already set options.use_gpu = true
. Why we still have a CPU error when we are using GPU? Also, it seems that this error has been commented out in the Fortran code, https://github.com/ralna/spral/blob/7e44d5a7039eed86603ea7a9b1875a410738e30f/src/ssids/datatypes.f90#L45?
@yuwenchen95 before using SSIDS you need to:
export OMP_CANCELLATION=TRUE
export OMP_PROC_BIND=TRUE
otherwise you get OMP cancellation errors as above. Unfortunately the only way to enable this is via setting environment variables, see #124. @amontoison we should add this to the Julia docs.
Jari, should we move the Julia interface Spral.jl
into this GitHub repository like GALAHAD
? It could be easier for the documentation.
@yuwenchen95 before using SSIDS you need to:
export OMP_CANCELLATION=TRUE export OMP_PROC_BIND=TRUE
otherwise you get OMP cancellation errors as above. Unfortunately the only way to enable this is via setting environment variables, see #124. @amontoison we should add this to the Julia docs.
Thanks! I then got another error for calling Spral.spral_ssids_factor(posdef, Int64[], Cint[], val, Base.Cdouble[], akeep, fkeep, options, inform)
(The same example as before)
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffb1f2176ef -- jl_object_id__cold at C:/workdir/src\builtins.c:417
in expression starting at C:\Users\ddt00\.julia\dev\QPtest\spral.jl:162
jl_object_id__cold at C:/workdir/src\builtins.c:417
ijl_object_id_ at C:/workdir/src\builtins.c:434 [inlined]
ijl_object_id_ at C:/workdir/src\builtins.c:422 [inlined]
jl_table_peek_bp at C:/workdir/src\iddict.c:119 [inlined]
ijl_eqtable_get at C:/workdir/src\iddict.c:158
lookup_leafcache at C:/workdir/src\gf.c:1125 [inlined]
jl_lookup_generic_ at C:/workdir/src\gf.c:2876
ijl_apply_generic at C:/workdir/src\gf.c:2936
jl_apply at C:/workdir/src\julia.h:1879 [inlined]
do_call at C:/workdir/src\interpreter.c:126
eval_value at C:/workdir/src\interpreter.c:226
eval_stmt_value at C:/workdir/src\interpreter.c:177 [inlined]
eval_body at C:/workdir/src\interpreter.c:624
jl_interpret_toplevel_thunk at C:/workdir/src\interpreter.c:762
jl_toplevel_eval_flex at C:/workdir/src\toplevel.c:912
jl_toplevel_eval_flex at C:/workdir/src\toplevel.c:856
ijl_toplevel_eval at C:/workdir/src\toplevel.c:921 [inlined]
ijl_toplevel_eval_in at C:/workdir/src\toplevel.c:971
eval at .\boot.jl:370 [inlined]
include_string at .\loading.jl:1903
_include at .\loading.jl:1963
include at .\client.jl:478
unknown function (ip: 00000205538d0566)
jl_apply at C:/workdir/src\julia.h:1879 [inlined]
do_call at C:/workdir/src\interpreter.c:126
eval_value at C:/workdir/src\interpreter.c:226
eval_stmt_value at C:/workdir/src\interpreter.c:177 [inlined]
eval_body at C:/workdir/src\interpreter.c:624
jl_interpret_toplevel_thunk at C:/workdir/src\interpreter.c:762
jl_toplevel_eval_flex at C:/workdir/src\toplevel.c:912
jl_toplevel_eval_flex at C:/workdir/src\toplevel.c:856
ijl_toplevel_eval at C:/workdir/src\toplevel.c:921 [inlined]
ijl_toplevel_eval_in at C:/workdir/src\toplevel.c:971
eval at .\boot.jl:370 [inlined]
eval_user_input at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:153
repl_backend_loop at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:249
#start_repl_backend#46 at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:234
start_repl_backend at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:231
#run_repl#59 at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:379
run_repl at C:\workdir\usr\share\julia\stdlib\v1.9\REPL\src\REPL.jl:365
jfptr_run_repl_61185.clone_1 at C:\Users\ddt00\AppData\Local\Programs\julia-1.9.2\lib\julia\sys.dll (unknown line)
#1017 at .\client.jl:421
jfptr_YY.1017_34710.clone_1 at C:\Users\ddt00\AppData\Local\Programs\julia-1.9.2\lib\julia\sys.dll (unknown line)
jl_apply at C:/workdir/src\julia.h:1879 [inlined]
jl_f__call_latest at C:/workdir/src\builtins.c:774
#invokelatest#2 at .\essentials.jl:816 [inlined]
invokelatest at .\essentials.jl:813 [inlined]
run_main_repl at .\client.jl:405
exec_options at .\client.jl:322
_start at .\client.jl:522
jfptr__start_47602.clone_1 at C:\Users\ddt00\AppData\Local\Programs\julia-1.9.2\lib\julia\sys.dll (unknown line)
jl_apply at C:/workdir/src\julia.h:1879 [inlined]
true_main at C:/workdir/src\jlapi.c:573
jl_repl_entrypoint at C:/workdir/src\jlapi.c:717
mainCRTStartup at C:/workdir/cli\loader_exe.c:59
BaseThreadInitThunk at C:\WINDOWS\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
Allocations: 2998 (Pool: 2987; Big: 11); GC: 0
Is it due to improper access for akeep
, which should be a pointer rather than the double pointer?
@yuwenchen95 Can you provide all the Julia code?
@yuwenchen95 Can you provide all the Julia code?
Sure. Here is the code I transfered from the documentation, https://www.numerical.rl.ac.uk/spral/doc/latest/C/ssids.html#c.spral_ssids_factor
using Spral
options = Spral.spral_ssids_options()
inform = Spral.spral_ssids_inform()
Spral.spral_ssids_default_options(options)
options.array_base = 1
check = Cint(true)
posdef = Cint(false)
n = 5
ptr = [1, 3, 6, 8, 9, 10]
row = Cint.([1, 2, 2, 3, 5, 3, 4, 4, 5])
val = [2.0, 1.0, 4.0, 1.0, 1.0, 3.0, 2.0, -1.0, 2.0]
akeep = [Ptr{Ptr{Cvoid}}()]
fkeep = [Ptr{Ptr{Cvoid}}()]
Spral.spral_ssids_analyse(check, n, Cint[], ptr, row, Base.Cdouble[], akeep, options, inform)
if(inform.flag<0)
Spral.spral_ssids_free(akeep, fkeep);
exit(1);
end
Spral.spral_ssids_factor(posdef, Int64[], Cint[], val, Base.Cdouble[], akeep, fkeep, options, inform)
if(inform.flag<0)
spral_ssids_free(akeep,fkeep);
exit(1);
end
# Solve
x = [4.0, 17.0, 19.0, 2.0, 12.0];
Spral.spral_ssids_solve1(0, x, akeep, fkeep, options, inform);
if(inform.flag<0)
Spral.spral_ssids_free(akeep, fkeep);
exit(1);
end
It collapses when calling spral_ssids_factor
function.
I am trying to use the SPRAL's SSIDS GPU direct solver in Julia. When I was running
I got the following error
Anyone has a clue to solve it?
Btw, is there any better way to use SPRAL like HSL in Julia, instead of the SPRAL_jll?