multiscale / muscle3

The third major version of the MUltiScale Coupling Library and Environment
Apache License 2.0
25 stars 13 forks source link

init shown as running the whole time ... #254

Closed DavidPCoster closed 9 months ago

DavidPCoster commented 1 year ago

Why does init show as active the whole time?

timeline

It basically has

    instance = Instance(
        {
            Operator.O_F: ["final_equilibrium_out", "final_core_profiles_out", "final_core_transport_out", "final_core_sources_out", "final_transport_solver_numerics_out"],
        }
    )

as its pattern.

It reads some data, processes it, and then sends it to the main workflow, and then goes back to the resume_instance ...

    # enter re-use loop
    while instance.reuse_instance():

        # F_INIT
            # O_I

            # S

        # O_F

        ...

Or should I find a way of shutting it down once it has finished sending?

LourensVeen commented 1 year ago

Ah, no, this is a plotting artifact. I just didn't consider this case.

What's happening is that the processing isn't done by libmuscle, so I don't know that it's happening. All I have is data about communication and waiting. So the plot is drawn by first drawing a dark grey bar covering the whole run time, and then drawing communication and waiting events on top. I'm drawing all of the grey bars from the start of the whole simulation to the end, rather than from the start to the end of the run of that specific instance.

So what actually happens here is that init starts up, registers, runs, sends its result, and then deregisters and shuts down (assuming it has no incoming F_INIT ports), while the rest of the simulation continues. It just gets plotted incorrectly. Will fix, thanks!

LourensVeen commented 1 year ago

Hold on, I'm wrong. It's not a plotting issue.

I've been looking into this a bit, and it's not a plotting problem, it's due to how the shutdown algorithm of MUSCLE3 works. What happens is that init starts up, sends its output, then shuts down, and as part of shutting down closes its outgoing ports. This entails sending a special ClosePort notification on the port that the port will be closed, and that no more messages will be coming. This lets the instances connected to those ports know that they won't be getting any more inputs, and can therefore shut down too.

However, this notification is really just a message, and messages are sent by putting them in a local queue and waiting for the recipient to pick them up. So that's what we do here too. The recipient however is busy calculating things, and won't be receiving that notification until next time it calls reuse_instance(), which is all the way at the end of the simulation. And so init continues to be active until the instances it sends to have completed their current reuse iteration.

Shutting down a MUSCLE3 simulation cleanly is nontrivial, and I don't think the algorithm needs to be changed, but it may be possible to deregister from the manager before closing the ports, which would fix the statistics. Or we could add a separate profiling event type to record the moment we start shutting down, and use that in the statistics, that would also fix it.

DavidPCoster commented 1 year ago

If it helps, this is what I find in muscle3.log

<g2dpc@s53 ~/GIT/ets_paf>egrep 'instance init|^init' $RUN/muscle3_manager.log
muscle_manager 2023-07-06 22:39:10,374 INFO    libmuscle.manager.mmp_server: Registered instance init
init           2023-07-06 22:39:15,695 DEBUG   init: Sending message on final_equilibrium_out
init           2023-07-06 22:39:19,669 DEBUG   init: Sending message on final_core_profiles_out
init           2023-07-06 22:39:36,996 DEBUG   init: Sending message on final_core_transport_out
init           2023-07-06 22:39:43,832 DEBUG   init: Sending message on final_core_sources_out
init           2023-07-06 22:39:47,220 DEBUG   init: Sending message on final_transport_solver_numerics_out
init           2023-07-06 22:39:47,222 DEBUG   init: Closing port final_equilibrium_out
init           2023-07-06 22:39:47,222 DEBUG   init: Sending message on final_equilibrium_out
init           2023-07-06 22:39:47,222 DEBUG   init: Closing port final_core_profiles_out
init           2023-07-06 22:39:47,223 DEBUG   init: Sending message on final_core_profiles_out
init           2023-07-06 22:39:47,223 DEBUG   init: Closing port final_core_transport_out
init           2023-07-06 22:39:47,224 DEBUG   init: Sending message on final_core_transport_out
init           2023-07-06 22:39:47,224 DEBUG   init: Closing port final_core_sources_out
init           2023-07-06 22:39:47,224 DEBUG   init: Sending message on final_core_sources_out
init           2023-07-06 22:39:47,225 DEBUG   init: Closing port final_transport_solver_numerics_out
init           2023-07-06 22:39:47,225 DEBUG   init: Sending message on final_transport_solver_numerics_out
muscle_manager 2023-07-07 00:40:54,757 INFO    libmuscle.manager.mmp_server: Deregistered instance init
<g2dpc@s53 ~/GIT/ets_paf>
LourensVeen commented 1 year ago

Yeah, that makes sense, it's done at 22:39:47,222, then it sends all the ClosePort messages, and at that point it's really done, it just needs to wait for those messages to be retrieved their recipients. So I think I'll add a DISCONNECT_WAIT event that measures from the point where it starts shutting down at 22:39:47,222 to the point where all the outboxes are empty and we can really shut down at 00:40:54. Then I can take the start timestamp of that event as the end of the run for profiling purposes, and everything will make sense.

LourensVeen commented 9 months ago

Fix released with version 0.7.1, please reopen if you encounter further issues.