starpu-runtime / starpu

This is a mirror of https://gitlab.inria.fr/starpu/starpu where our development happens, but contributions are welcome here too!
https://starpu.gitlabpages.inria.fr/
GNU Lesser General Public License v2.1
58 stars 13 forks source link

How to run SimGrid simulations if my code just provides Python wrappers for `starpu_task_insert` functions and has no `main` function? #42

Open Muxas opened 5 months ago

Muxas commented 5 months ago

Hi! I implement a Python program, that uses StarPU under the hood. The Python program simply calls Python/C++ wrappers, which pass execution to C++ routines which then call StarPU task-related functions. Reading documentation section on the SimGrid, I found that main() function shall be substituted by starpu_simgrid_wrap.h. However, it is not possible with the Python interpreter, as it would require recompiling entire Python from sources. Is there a way to perform SimGrid simulation in such a case? I did not found an example at StarPU documentation.

Thank you in advance!

sthibaul commented 5 months ago

Hello, What problem do you actually get? Apparently there was a segfault, which I'm currently fixing, but apart from that in principle things should just work (though slower because of the use of the thread factory) Samuel

Muxas commented 5 months ago

To be honest, I just did not try it. I read I have to compile executable to have main() substituted by a surrogate and decided to ask before trying. I saw you added an example for starpupy. I will follow it. Thank you!

sthibaul commented 5 months ago

I have pushed a fix for the simple newly-added starpu_py_perfmodel.py case. In general, combining python and simgrid will be difficult, because python has no idea that it should use simgrid's pthread functions to synchronize threads, so notably the asyncio mode will probably not work, because simgrid is not notified that the main thread is blocked waiting for tasks to execute. Only explicit starpu waiting through task/data waits will work.

Muxas commented 5 months ago

As of now, our Python interface does not use asyncio module, but does data management to/from numpy arrays through acquire-release mechanism at underlying C++ code. Waiting is done through Python-wrapped starpu_task_wait_for_all(). Seems like SimGrid shall work on top of that. I will try it some time later (maybe next week) and provide you a feedback.

Muxas commented 5 months ago

I have tried it on my MacbookPro and failed. I have installed simgrid 3.35, compiled StarPU (latest master branch) and as soon as I do starpu_init() through my Python wrapper I get the following segfault:

[starpu][_starpu_simgrid_init_early] Warning: In simgrid mode, the file containing the main() function of this application should be compiled with starpu.h or starpu_simgrid_wrap.h included, to properly rename it into starpu_main to avoid having to use --cfg=contexts/factory:thread which reduces performance
[0.000000] [platf_parse/INFO] You're using a v4.0 XML file (/Users/muxas/Code/nntile/check/perfmodel/bus/zhores.ais-gpu.starpu-1.5.platform.v4.xml) while the current standard is v4.1 That's fine, the new version is backward compatible. 

Use simgrid_update_xml to update your file automatically to get rid of this warning. This program is installed automatically with SimGrid, or available in the tools/ directory of the source archive.
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'network/TCP-gamma' to '-1'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'network/bandwidth-factor' to '1'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'network/crosstraffic' to '0'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'network/latency-factor' to '1'
[0.000000] [xbt_cfg/INFO] Configuration change: Set 'network/weight-S' to '0.0'
Access violation or Bus error detected.
This probably comes from a programming error in your code, or from a stack
overflow. If you are certain of your code, try increasing the stack size
   --cfg=contexts/stack-size:XXX (current size is 8176 KiB).

If it does not help, this may have one of the following causes:
a bug in SimGrid, a bug in the OS or a bug in a third-party libraries.
Failing hardware can sometimes generate such errors too.

If you think you've found a bug in SimGrid, please report it along with a
Minimal Working Example (MWE) reproducing your problem and a full backtrace
of the fault captured with gdb or valgrind.
zsh: segmentation fault  ipython

It is advised to enable option --cfg=contexts/factory:thread but neither Python nor my executable reads this argument. I am using simple starpu_init(NULL) under the hood. Do I have to fake argv to pass to starpu_init()?

Muxas commented 5 months ago

Here is the full trace:

(lldb) bt all
* thread #2, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x11)
    frame #0: 0x0000000101046fc8 libstarpu-1.4.1.dylib`::_starpu_simgrid_cpp_init() at stl_tree.h:790:53
  * frame #1: 0x00000001010173a8 libstarpu-1.4.1.dylib`starpu_initialize(user_conf=0x000000012859c000, argc=<unavailable>, argv=<unavailable>) at workers.c:1590:2
    frame #2: 0x0000000100325074 nntile_core.so`void pybind11::cpp_function::initialize<void pybind11::detail::initimpl::constructor<int, int, int>::execute<pybind11::class_<nntile::starpu::Config>, 0>(pybind11::class_<nntile::starpu::Config>&)::'lambda'(pybind11::detail::value_and_holder&, int, int, int), void, pybind11::detail::value_and_holder&, int, int, int, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::detail::is_new_style_constructor>(pybind11::class_<nntile::starpu::Config>&&, void (*)(pybind11::detail::value_and_holder&, int, int, int), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::detail::is_new_style_constructor const&)::'lambda'(pybind11::detail::function_call&)::_FUN((null)=0x000000012859c000) at config.hh:82:26
    frame #3: 0x0000000100317030 nntile_core.so`pybind11::cpp_function::dispatcher(self=<unavailable>, args_in=0x000000012a032610, kwargs_in=0x0000000000000000) at pybind11.h:946:35
    frame #4: 0x00000001009a69e4 Python`cfunction_call + 60
    frame #5: 0x000000010095ef68 Python`_PyObject_MakeTpCall + 128
    frame #6: 0x00000001009624d4 Python`method_vectorcall + 536
    frame #7: 0x00000001009c4dd0 Python`slot_tp_init + 464
    frame #8: 0x00000001009bd634 Python`type_call + 144
    frame #9: 0x0000000100312ae4 nntile_core.so`pybind11_meta_call(type=<unavailable>, args=<unavailable>, kwargs=<unavailable>) at class.h:187:41
    frame #10: 0x000000010095ef68 Python`_PyObject_MakeTpCall + 128
    frame #11: 0x0000000100a37238 Python`_PyEval_EvalFrameDefault + 40652
    frame #12: 0x0000000100a2c828 Python`PyEval_EvalCode + 168
    frame #13: 0x0000000100a7e044 Python`run_eval_code_obj + 84
    frame #14: 0x0000000100a7dfa8 Python`run_mod + 112
    frame #15: 0x0000000100a803d0 Python`PyRun_StringFlags + 112
    frame #16: 0x0000000100a80318 Python`PyRun_SimpleStringFlags + 64
    frame #17: 0x0000000100a985e0 Python`pymain_run_command + 144
    frame #18: 0x0000000100a980b4 Python`Py_RunMain + 228
    frame #19: 0x0000000100a993c8 Python`Py_BytesMain + 40
    frame #20: 0x00000001819a10e0 dyld`start + 2360
  thread #3
    frame #0: 0x0000000181ce506c libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d225fc libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x0000000103ce2b58 libopenblas64_.0.dylib`blas_thread_server + 360
    frame #3: 0x0000000181d22034 libsystem_pthread.dylib`_pthread_start + 136
  thread #4
    frame #0: 0x0000000181ce506c libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d225fc libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x0000000103ce2b58 libopenblas64_.0.dylib`blas_thread_server + 360
    frame #3: 0x0000000181d22034 libsystem_pthread.dylib`_pthread_start + 136
  thread #5
    frame #0: 0x0000000181ce506c libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d225fc libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x0000000103ce2b58 libopenblas64_.0.dylib`blas_thread_server + 360
    frame #3: 0x0000000181d22034 libsystem_pthread.dylib`_pthread_start + 136
  thread #6
    frame #0: 0x0000000181ce506c libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d225fc libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x0000000103ce2b58 libopenblas64_.0.dylib`blas_thread_server + 360
    frame #3: 0x0000000181d22034 libsystem_pthread.dylib`_pthread_start + 136
  thread #7
    frame #0: 0x0000000181ce506c libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d225fc libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x0000000103ce2b58 libopenblas64_.0.dylib`blas_thread_server + 360
    frame #3: 0x0000000181d22034 libsystem_pthread.dylib`_pthread_start + 136
  thread #8
    frame #0: 0x0000000181ce506c libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d225fc libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x0000000103ce2b58 libopenblas64_.0.dylib`blas_thread_server + 360
    frame #3: 0x0000000181d22034 libsystem_pthread.dylib`_pthread_start + 136
  thread #9
    frame #0: 0x0000000181ce506c libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d225fc libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x0000000103ce2b58 libopenblas64_.0.dylib`blas_thread_server + 360
    frame #3: 0x0000000181d22034 libsystem_pthread.dylib`_pthread_start + 136
  thread #10
    frame #0: 0x0000000181ce506c libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d225fc libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x0000000103ce2b58 libopenblas64_.0.dylib`blas_thread_server + 360
    frame #3: 0x0000000181d22034 libsystem_pthread.dylib`_pthread_start + 136
  thread #11
    frame #0: 0x0000000181ce506c libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d225fc libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x0000000103ce2b58 libopenblas64_.0.dylib`blas_thread_server + 360
    frame #3: 0x0000000181d22034 libsystem_pthread.dylib`_pthread_start + 136
  thread #12
    frame #0: 0x0000000181ce506c libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000181d225fc libsystem_pthread.dylib`_pthread_cond_wait + 1228
    frame #2: 0x0000000181c494dc libc++.1.dylib`std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&) + 28
    frame #3: 0x0000000101fca34c libsimgrid.3.35.dylib`simgrid::kernel::context::ThreadContext::wait() + 80
    frame #4: 0x0000000101fc9d34 libsimgrid.3.35.dylib`simgrid::kernel::context::SerialThreadContext::run_all(std::__1::vector<simgrid::kernel::actor::ActorImpl*, std::__1::allocator<simgrid::kernel::actor::ActorImpl*>> const&) + 52
    frame #5: 0x0000000101fad28c libsimgrid.3.35.dylib`simgrid::kernel::EngineImpl::run_all_actors() + 48
    frame #6: 0x0000000101faf67c libsimgrid.3.35.dylib`simgrid::kernel::EngineImpl::run(double) + 300
    frame #7: 0x0000000101f68180 libsimgrid.3.35.dylib`simgrid_run + 40
    frame #8: 0x0000000101fc9f38 libsimgrid.3.35.dylib`simgrid::kernel::context::ThreadContext::wrapper(simgrid::kernel::context::ThreadContext*) + 104
    frame #9: 0x0000000101fca87c libsimgrid.3.35.dylib`void* std::__1::__thread_proxy[abi:v160006]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (*)(simgrid::kernel::context::ThreadContext*), simgrid::kernel::context::ThreadContext*>>(void*) + 52
    frame #10: 0x0000000181d22034 libsystem_pthread.dylib`_pthread_start + 136

Notably, line 1590 of starpu_initialize() at workers.c:1590 just initializes simgrid:

frame #1: 0x00000001010173a8 libstarpu-1.4.1.dylib`starpu_initialize(user_conf=0x000000012859c000, argc=<unavailable>, argv=<unavailable>) at workers.c:1590:2
   1587     STARPU_HG_DISABLE_CHECKING(_starpu_worker_parallel_blocks);
   1588 #ifdef STARPU_SIMGRID
   1589     /* This initializes the simgrid thread library, thus needs to be early */
-> 1590     _starpu_simgrid_init_early(argc, argv);
   1591 #endif
   1592 
   1593     STARPU_PTHREAD_MUTEX_LOCK(&init_mutex);

I believe running SimGrid through Python on MacOS is impossible...

sthibaul commented 5 months ago

It is advised to enable option --cfg=contexts/factory:thread

No, StarPU already does it for you. It just warns that it's not the most efficient way of using Simgrid.

I believe running SimGrid through Python on MacOS is impossible...

I don't know if simgrid on macos was tested, for a start... Let alone with python

sthibaul commented 5 months ago

I believe running SimGrid through Python on MacOS is impossible...

I don't know if simgrid on macos was tested, for a start...

You can run make check in your starpu-simgrid tree, to check that at least that part works or not.

Muxas commented 5 months ago

I did not yet try simple make check, but at least now starpu_init with SimGrid enabled does not segfault. Now I get error with scheduler dmdasd:

/usr/local/lib/libstarpu-1.4.so.1(+0x8ff4a)[0x7f20cb01df4a]
/usr/local/lib/libstarpu-1.4.so.1(+0x90326)[0x7f20cb01e326]
/usr/local/lib/libstarpu-1.4.so.1(+0x76ec3)[0x7f20cb004ec3]
/usr/local/lib/libstarpu-1.4.so.1(+0x77799)[0x7f20cb005799]
/usr/local/lib/libstarpu-1.4.so.1(_starpu_task_submit+0x217)[0x7f20cafd1357]
/usr/local/lib/libstarpu-1.4.so.1(_starpu_task_insert_v+0x2a)[0x7f20cb066e4a]
/usr/local/lib/libstarpu-1.4.so.1(starpu_task_insert+0x9d)[0x7f20cb06703d]
/home/al.mikhalev/Code/nntile_muxas/build/libnntile.so(_ZN6nntile6starpu4copy6submitENS0_6HandleES2_+0x5c)[0x7f20cb238cf7]
/home/al.mikhalev/Code/nntile_muxas/build/libnntile.so(_ZN6nntile6tensor13scatter_asyncIlEEvRKNS0_6TensorIT_EES6_+0x216)[0x7f20cb29191a]
/home/al.mikhalev/Code/nntile_muxas/build/libnntile.so(_ZN6nntile6tensor7scatterIlEEvRKNS0_6TensorIT_EES6_+0x27)[0x7f20cb292916]
/home/al.mikhalev/Code/nntile_muxas/build/wrappers/python/nntile/nntile_core.so(+0x636cd)[0x7f20cb3996cd]
/home/al.mikhalev/Code/nntile_muxas/build/wrappers/python/nntile/nntile_core.so(+0xc5cdd)[0x7f20cb3fbcdd]
/home/al.mikhalev/Code/nntile_muxas/build/wrappers/python/nntile/nntile_core.so(+0xaae1e)[0x7f20cb3e0e1e]
/home/al.mikhalev/Code/nntile_muxas/build/wrappers/python/nntile/nntile_core.so(+0x99a8b)[0x7f20cb3cfa8b]
/home/al.mikhalev/Code/nntile_muxas/build/wrappers/python/nntile/nntile_core.so(+0x99c55)[0x7f20cb3cfc55]
/home/al.mikhalev/Code/nntile_muxas/build/wrappers/python/nntile/nntile_core.so(+0x3e616)[0x7f20cb374616]
/usr/bin/python3(+0x15a10e)[0x55805f42f10e]
/usr/bin/python3(_PyObject_MakeTpCall+0x25b)[0x55805f425a7b]
/usr/bin/python3(+0x168acb)[0x55805f43dacb]
/usr/bin/python3(_PyEval_EvalFrameDefault+0x614a)[0x55805f41dcfa]
/usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55805f42f9fc]
/usr/bin/python3(_PyEval_EvalFrameDefault+0x614a)[0x55805f41dcfa]
/usr/bin/python3(+0x13f9c6)[0x55805f4149c6]
/usr/bin/python3(PyEval_EvalCode+0x86)[0x55805f50a256]
/usr/bin/python3(+0x260108)[0x55805f535108]
/usr/bin/python3(+0x2599cb)[0x55805f52e9cb]
/usr/bin/python3(+0x25fe55)[0x55805f534e55]
/usr/bin/python3(_PyRun_SimpleFileObject+0x1a8)[0x55805f534338]
/usr/bin/python3(_PyRun_AnyFileObject+0x43)[0x55805f533f83]
/usr/bin/python3(Py_RunMain+0x2be)[0x55805f526a5e]
/usr/bin/python3(Py_BytesMain+0x2d)[0x55805f4fd02d]
/usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f217bf9dd90]
python3: sched_policies/deque_modeling_policy_data_aware.c:720: _dmda_push_task: Assertion `0 && "forced_best != -1 || best != -1"' failed.

Thread 1 "python3" received signal SIGABRT, Aborted.
0x00007f217c00a9fc in pthread_kill () from /usr/lib/x86_64-linux-gnu/libc.so.6
(gdb) bt full
#0  0x00007f217c00a9fc in pthread_kill () from /usr/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x00007f217bfb6476 in raise () from /usr/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#2  0x00007f217bf9c7f3 in abort () from /usr/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#3  0x00007f217bf9c71b in ?? () from /usr/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#4  0x00007f217bfade96 in __assert_fail () from /usr/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#5  0x00007f20cb01df78 in _dmda_push_task (task=<optimized out>, prio=prio@entry=1, sched_ctx_id=<optimized out>, da=da@entry=1, 
    simulate=simulate@entry=0, sorted_decision=sorted_decision@entry=1) at sched_policies/deque_modeling_policy_data_aware.c:720
        best = <optimized out>
        best_in_ctx = <optimized out>
        selected_impl = <optimized out>
        model_best = 0
        transfer_model_best = 0
        forced_best = <optimized out>
        forced_impl = -1
        dt = <optimized out>
        workers = <optimized out>
        nworkers = <optimized out>
        local_task_length = <optimized out>
        local_data_penalty = <optimized out>
        local_energy = <optimized out>
        exp_end = <optimized out>
        min_exp_end_of_task = <optimized out>
        max_exp_end_of_workers = <optimized out>
        fitness = 0x7ffcc8955950
        __PRETTY_FUNCTION__ = "_dmda_push_task"
#6  0x00007f20cb01e326 in dmda_push_sorted_decision_task (task=<optimized out>)
    at sched_policies/deque_modeling_policy_data_aware.c:764
No locals.
#7  0x00007f20cb004ec3 in _starpu_push_task_to_workers (task=0x5580675a9a20) at core/sched_policy.c:824
        __timer = 0x0
        worker = <optimized out>
--Type <RET> for more, q to quit, c to continue without paging--
        nworkers = <optimized out>
        config = <optimized out>
        sched_ctx = <optimized out>
        ret = 0
        __func__ = "_starpu_push_task_to_workers"
        __PRETTY_FUNCTION__ = "_starpu_push_task_to_workers"
#8  0x00007f20cb005799 in _starpu_repush_task (j=0x5580675aa8d0) at core/sched_policy.c:696
        task = 0x5580675a9a20
        sched_ctx = 0x7f20cb1936e8 <_starpu_config+569704>
        ret = <optimized out>
        can_push = 1
        __PRETTY_FUNCTION__ = "_starpu_repush_task"
        continuation = 0
#9  0x00007f20cb005b7d in _starpu_push_task (j=<optimized out>) at core/sched_policy.c:584
        __PRETTY_FUNCTION__ = "_starpu_push_task"
#10 0x00007f20cafd1357 in _starpu_task_submit (task=0x5580675a9a20, nodeps=<optimized out>) at core/task.c:1106
        __PRETTY_FUNCTION__ = "_starpu_task_submit"
        __func__ = "_starpu_task_submit"
        ret = <optimized out>
        is_sync = 0
        bundle = <optimized out>
        j = 0x5580675aa8d0
        continuation = 0
        info = <optimized out>
        profiling = 1
        __ptrs = <optimized out>
        __n = <optimized out>
#11 0x00007f20cb066e4a in _starpu_task_insert_v (cl=0x7f20cb2f25c0 <nntile::starpu::copy::codelet>, 
    varg_list=varg_list@entry=0x7ffcc8956b00) at util/starpu_task_insert.c:160
        task = 0x5580675a9a20
        ret = <optimized out>
        __func__ = "_starpu_task_insert_v"
#12 0x00007f20cb06703d in starpu_task_insert (cl=<optimized out>) at util/starpu_task_insert.c:194
        varg_list = {{gp_offset = 8, fp_offset = 48, overflow_arg_area = 0x7ffcc8956be0, reg_save_area = 0x7ffcc8956b20}}
        ret = <optimized out>
#13 0x00007f20cb238cf7 in nntile::starpu::copy::submit(nntile::starpu::Handle, nntile::starpu::Handle) ()
--Type <RET> for more, q to quit, c to continue without paging--
   from /home/al.mikhalev/Code/nntile_muxas/build/libnntile.so
No symbol table info available.
#14 0x00007f20cb29191a in void nntile::tensor::scatter_async<long>(nntile::tensor::Tensor<long> const&, nntile::tensor::Tensor<long> const&) () from /home/al.mikhalev/Code/nntile_muxas/build/libnntile.so
No symbol table info available.
#15 0x00007f20cb292916 in void nntile::tensor::scatter<long>(nntile::tensor::Tensor<long> const&, nntile::tensor::Tensor<long> const&) () from /home/al.mikhalev/Code/nntile_muxas/build/libnntile.so
No symbol table info available.
#16 0x00007f20cb3996cd in ?? () from /home/al.mikhalev/Code/nntile_muxas/build/wrappers/python/nntile/nntile_core.so
No symbol table info available.
#17 0x00007f20cb3fbcdd in ?? () from /home/al.mikhalev/Code/nntile_muxas/build/wrappers/python/nntile/nntile_core.so
No symbol table info available.
#18 0x00007f20cb3e0e1e in ?? () from /home/al.mikhalev/Code/nntile_muxas/build/wrappers/python/nntile/nntile_core.so
No symbol table info available.
#19 0x00007f20cb3cfa8b in ?? () from /home/al.mikhalev/Code/nntile_muxas/build/wrappers/python/nntile/nntile_core.so
No symbol table info available.
#20 0x00007f20cb3cfc55 in ?? () from /home/al.mikhalev/Code/nntile_muxas/build/wrappers/python/nntile/nntile_core.so
No symbol table info available.
#21 0x00007f20cb374616 in ?? () from /home/al.mikhalev/Code/nntile_muxas/build/wrappers/python/nntile/nntile_core.so
No symbol table info available.
#22 0x000055805f42f10e in ?? ()
No symbol table info available.
#23 0x000055805f425a7b in _PyObject_MakeTpCall ()
No symbol table info available.
#24 0x000055805f43dacb in ?? ()
No symbol table info available.
#25 0x000055805f41dcfa in _PyEval_EvalFrameDefault ()
No symbol table info available.
#26 0x000055805f42f9fc in _PyFunction_Vectorcall ()
No symbol table info available.
#27 0x000055805f41dcfa in _PyEval_EvalFrameDefault ()
No symbol table info available.
#28 0x000055805f4149c6 in ?? ()
No symbol table info available.
#29 0x000055805f50a256 in PyEval_EvalCode ()
No symbol table info available.
--Type <RET> for more, q to quit, c to continue without paging--
#30 0x000055805f535108 in ?? ()
No symbol table info available.
#31 0x000055805f52e9cb in ?? ()
No symbol table info available.
#32 0x000055805f534e55 in ?? ()
No symbol table info available.
#33 0x000055805f534338 in _PyRun_SimpleFileObject ()
No symbol table info available.
#34 0x000055805f533f83 in _PyRun_AnyFileObject ()
No symbol table info available.
#35 0x000055805f526a5e in Py_RunMain ()
No symbol table info available.
#36 0x000055805f4fd02d in Py_BytesMain ()
No symbol table info available.
#37 0x00007f217bf9dd90 in ?? () from /usr/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#38 0x00007f217bf9de40 in __libc_start_main () from /usr/lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#39 0x000055805f4fcf25 in _start ()
No symbol table info available.

Everything is optimized out, but at least it works up to this point.

Muxas commented 5 months ago

The error above seems to be caused by manually set cpu_funcs={0}. Removing manual zeroficiation solved the issue. Seems like I am on the right way! I will continue posting my progress here. Current problem is already within uncalibrated perfmodel:

Codelet nntile_transpose_fp32 does not have a perfmodel, or is not calibrated enough, please re-run in non-simgrid mode until it is calibrated, or fix the STARPU_HOSTNAME and STARPU_PERF_MODEL_DIR environment variables
Muxas commented 5 months ago

Sorry, forgot to mention, that it works within docker on Linux distribution. I stopped trying to make it work on MacOs. However, either simulation hits a deadlock, or it is much slower, than actual computations.

sthibaul commented 5 months ago

The error above seems to be caused by manually set cpu_funcs={0}. Removing manual zeroficiation solved the issue.

This is very odd and needs to be checked. What is the situation when you set this? In principle it should either

Seems like I am on the right way! I will continue posting my progress here. Current problem is already within uncalibrated perfmodel:

Yes, you need to calibrate your performance models, for simgrid to know how much time it should account for the tasks.

either simulation hits a deadlock,

Deadlocks are not supposed to happen, we do pass the starpu testsuite with simgrid, except for the python part which has not been worked on so far.

or it is much slower, than actual computations

For python+simgrid, that's expected, since for now we haven't implemented not calling the actual computation function. You can try to do this by hand in your code.

Muxas commented 4 months ago

Simulation just never finishes. The following output shows every actor is waiting for some synchronization activity when I press CTRL-C. If the code is executed in normal mode (non-simgrid) it just works.

[MAIN:main:(1) 2.064330] [ker_engine/INFO] CTRL-C pressed. The current status will be displayed before exit (disable that behavior with option 'debug/verbose-exit').
[MAIN:main:(1) 2.064330] [ker_engine/INFO] 37 actors are still running, waiting for something.
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Legend of the following listing: "Actor <pid> (<name>@<host>): <status>"
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 1 (main@MAIN) simcall Simcall::NONE
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 2 (worker 0 runner@CUDA0): waiting for synchronization activity 0x7fe98c0b6a50 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 3 (worker 1 runner@CUDA1): waiting for synchronization activity 0x7fe98c0b59d0 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 4 (worker 2 runner@CUDA2): waiting for synchronization activity 0x7fe98c0b6330 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 5 (worker 3 runner@CUDA3): waiting for synchronization activity 0x7fe98c0b73c0 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 6 (worker 4 runner@CUDA4): waiting for synchronization activity 0x7fe98c157680 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 7 (worker 5 runner@CUDA5): waiting for synchronization activity 0x7fe98c15bde0 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 8 (worker 6 runner@CUDA6): waiting for synchronization activity 0x7fe98c15a860 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 9 (worker 7 runner@CUDA7): waiting for synchronization activity 0x7fe98c157860 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 10 (worker 8 runner@CPU0): waiting for synchronization activity 0x7fe98c0ab350 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 11 (worker 9 runner@CPU1): waiting for synchronization activity 0x7fe98c0abce0 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 12 (worker 10 runner@CPU2): waiting for synchronization activity 0x7fe98c0abc10 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 13 (worker 11 runner@CPU3): waiting for synchronization activity 0x7fe98c0ac5a0 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 14 (worker 12 runner@CPU4): waiting for synchronization activity 0x7fe98c0ac4d0 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 15 (worker 13 runner@CPU5): waiting for synchronization activity 0x7fe98c0ace60 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 16 (worker 14 runner@CPU6): waiting for synchronization activity 0x7fe98c0acd90 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 17 (worker 15 runner@CPU7): waiting for synchronization activity 0x7fe98c0ad680 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 18 (CUDA@CUDA0): waiting for synchronization activity 0x7fe98c15b4b0 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 19 (CUDA@CUDA1): waiting for synchronization activity 0x7fe98c15b0e0 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 20 (CUDA@CUDA2): waiting for synchronization activity 0x7fe98c15a730 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 21 (CUDA@CUDA3): waiting for sleeping activity 0x7fe98c0b1c50 (sleep) in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 22 (CUDA@CUDA4): waiting for synchronization activity 0x7fe98c158ba0 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 23 (CUDA@CUDA5): waiting for synchronization activity 0x7fe98c15b6f0 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 24 (CUDA@CUDA6): waiting for synchronization activity 0x7fe98c0b6960 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 25 (CUDA@CUDA7): waiting for synchronization activity 0x7fe98c15a130 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 26 (CPU@CPU0): waiting for synchronization activity 0x7fe98c15c360 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 27 (CPU@CPU1): waiting for synchronization activity 0x7fe98c0a6410 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 28 (CPU@CPU2): waiting for synchronization activity 0x7fe98c0aa190 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 29 (CPU@CPU3): waiting for synchronization activity 0x7fe98c15ba50 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 30 (CPU@CPU4): waiting for synchronization activity 0x7fe98c15abf0 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 31 (CPU@CPU5): waiting for synchronization activity 0x7fe98c1584a0 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 32 (CPU@CPU6): waiting for synchronization activity 0x7fe98c1585d0 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 33 (CPU@CPU7): waiting for synchronization activity 0x7fe98c159ac0 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 34 (transfer 0-8 runner@RAM): waiting for synchronization activity 0x7fe98c1573b0 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 35 (transfer 0-1 runner@RAM): waiting for synchronization activity 0x7fe98c1583d0 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 36 (transfer 0-5 runner@RAM): waiting for synchronization activity 0x7fe98c159350 () in state WAITING to finish
[MAIN:main:(1) 2.064330] [ker_engine/INFO] Actor 37 (transfer 0-6 runner@RAM): waiting for synchronization activity 0x7fe98c15a360 () in state WAITING to finish
^C[2.064330] [ker_engine/INFO] CTRL-C pressed. The current status will be displayed before exit (disable that behavior with option 'debug/verbose-exit').
[2.064330] [ker_engine/INFO] 37 actors are still running, waiting for something.
[2.064330] [ker_engine/INFO] Legend of the following listing: "Actor <pid> (<name>@<host>): <status>"
[2.064330] [ker_engine/INFO] Actor 1 (main@MAIN) simcall Simcall::NONE
[2.064330] [ker_engine/INFO] Actor 2 (worker 0 runner@CUDA0): waiting for synchronization activity 0x7fe98c0b6a50 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 3 (worker 1 runner@CUDA1): waiting for synchronization activity 0x7fe98c0b59d0 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 4 (worker 2 runner@CUDA2): waiting for synchronization activity 0x7fe98c0b6330 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 5 (worker 3 runner@CUDA3): waiting for synchronization activity 0x7fe98c0b73c0 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 6 (worker 4 runner@CUDA4): waiting for synchronization activity 0x7fe98c157680 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 7 (worker 5 runner@CUDA5): waiting for synchronization activity 0x7fe98c15bde0 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 8 (worker 6 runner@CUDA6): waiting for synchronization activity 0x7fe98c15a860 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 9 (worker 7 runner@CUDA7): waiting for synchronization activity 0x7fe98c157860 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 10 (worker 8 runner@CPU0): waiting for synchronization activity 0x7fe98c0ab350 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 11 (worker 9 runner@CPU1): waiting for synchronization activity 0x7fe98c0abce0 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 12 (worker 10 runner@CPU2): waiting for synchronization activity 0x7fe98c0abc10 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 13 (worker 11 runner@CPU3): waiting for synchronization activity 0x7fe98c0ac5a0 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 14 (worker 12 runner@CPU4): waiting for synchronization activity 0x7fe98c0ac4d0 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 15 (worker 13 runner@CPU5): waiting for synchronization activity 0x7fe98c0ace60 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 16 (worker 14 runner@CPU6): waiting for synchronization activity 0x7fe98c0acd90 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 17 (worker 15 runner@CPU7): waiting for synchronization activity 0x7fe98c0ad680 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 18 (CUDA@CUDA0): waiting for synchronization activity 0x7fe98c15b4b0 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 19 (CUDA@CUDA1): waiting for synchronization activity 0x7fe98c15b0e0 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 20 (CUDA@CUDA2): waiting for synchronization activity 0x7fe98c15a730 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 21 (CUDA@CUDA3): waiting for sleeping activity 0x7fe98c0b1c50 (sleep) in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 22 (CUDA@CUDA4): waiting for synchronization activity 0x7fe98c158ba0 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 23 (CUDA@CUDA5): waiting for synchronization activity 0x7fe98c15b6f0 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 24 (CUDA@CUDA6): waiting for synchronization activity 0x7fe98c0b6960 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 25 (CUDA@CUDA7): waiting for synchronization activity 0x7fe98c15a130 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 26 (CPU@CPU0): waiting for synchronization activity 0x7fe98c15c360 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 27 (CPU@CPU1): waiting for synchronization activity 0x7fe98c0a6410 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 28 (CPU@CPU2): waiting for synchronization activity 0x7fe98c0aa190 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 29 (CPU@CPU3): waiting for synchronization activity 0x7fe98c15ba50 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 30 (CPU@CPU4): waiting for synchronization activity 0x7fe98c15abf0 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 31 (CPU@CPU5): waiting for synchronization activity 0x7fe98c1584a0 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 32 (CPU@CPU6): waiting for synchronization activity 0x7fe98c1585d0 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 33 (CPU@CPU7): waiting for synchronization activity 0x7fe98c159ac0 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 34 (transfer 0-8 runner@RAM): waiting for synchronization activity 0x7fe98c1573b0 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 35 (transfer 0-1 runner@RAM): waiting for synchronization activity 0x7fe98c1583d0 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 36 (transfer 0-5 runner@RAM): waiting for synchronization activity 0x7fe98c159350 () in state WAITING to finish
[2.064330] [ker_engine/INFO] Actor 37 (transfer 0-6 runner@RAM): waiting for synchronization activity 0x7fe98c15a360 () in state WAITING to finish
[2.064330] ./src/xbt/exception.cpp:50: [xbt_exception/CRITICAL] Uncaught exception std::system_error: Resource deadlock avoided
[2.064330] ./src/xbt/exception.cpp:77: [xbt_exception/CRITICAL] Current backtrace:
  ->  0# 0x00007FEAAB7853A8 in /usr/lib/x86_64-linux-gnu/libsimgrid.so.3.30
  ->  1# 0x00007FEB6108420C in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
  ->  2# 0x00007FEB610831E9 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
  ->  3# __gxx_personality_v0 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
  ->  4# 0x00007FEB65D69884 in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
  ->  5# _Unwind_RaiseException in /usr/lib/x86_64-linux-gnu/libgcc_s.so.1
  ->  6# __cxa_throw in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
  ->  7# std::__throw_system_error(int) in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
  ->  8# std::thread::detach() in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
  ->  9# simgrid::kernel::context::ThreadContext::~ThreadContext() in /usr/lib/x86_64-linux-gnu/libsimgrid.so.3.30
  ->  10# 0x00007FEAAB7DC047 in /usr/lib/x86_64-linux-gnu/libsimgrid.so.3.30
  ->  11# simgrid::kernel::actor::ActorImpl::~ActorImpl() in /usr/lib/x86_64-linux-gnu/libsimgrid.so.3.30
  ->  12# simgrid::s4u::intrusive_ptr_release(simgrid::s4u::Actor const*) in /usr/lib/x86_64-linux-gnu/libsimgrid.so.3.30
  ->  13# 0x00007FEAAB9107F0 in /usr/lib/x86_64-linux-gnu/libsimgrid.so.3.30
  ->  14# simgrid::kernel::EngineImpl::shutdown() in /usr/lib/x86_64-linux-gnu/libsimgrid.so.3.30
  ->  15# 0x00007FEB694F6495 in /usr/lib/x86_64-linux-gnu/libc.so.6
  ->  16# on_exit in /usr/lib/x86_64-linux-gnu/libc.so.6
  ->  17# 0x00007FEAAB846A14 in /usr/lib/x86_64-linux-gnu/libsimgrid.so.3.30
  ->  18# 0x00007FEB694F3520 in /usr/lib/x86_64-linux-gnu/libc.so.6
  ->  19# 0x00007FEB69542115 in /usr/lib/x86_64-linux-gnu/libc.so.6
  ->  20# pthread_cond_wait in /usr/lib/x86_64-linux-gnu/libc.so.6
  ->  21# std::condition_variable::wait(std::unique_lock<std::mutex>&) in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
  ->  22# simgrid::kernel::context::ThreadContext::wait() in /usr/lib/x86_64-linux-gnu/libsimgrid.so.3.30
  ->  23# simgrid::kernel::context::SerialThreadContext::run_all() in /usr/lib/x86_64-linux-gnu/libsimgrid.so.3.30
  ->  24# simgrid::kernel::EngineImpl::run_all_actors() in /usr/lib/x86_64-linux-gnu/libsimgrid.so.3.30
  ->  25# simgrid::kernel::EngineImpl::run(double) in /usr/lib/x86_64-linux-gnu/libsimgrid.so.3.30
  ->  26# simgrid::kernel::context::ThreadContext::wrapper(simgrid::kernel::context::ThreadContext*) in /usr/lib/x86_64-linux-gnu/libsimgrid.so.3.30
  ->  27# 0x00007FEB610B2253 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6
  ->  28# 0x00007FEB69545AC3 in /usr/lib/x86_64-linux-gnu/libc.so.6
  ->  29# __clone in /usr/lib/x86_64-linux-gnu/libc.so.6
sthibaul commented 4 months ago

Not all actors are waiting for synchronization, the CUDA@CUDA3 actor is sleeping, i.e. most probably running a task, which others are depending on.