org-arl / fjage

Framework for Java and Groovy Agents
https://fjage.readthedocs.io/en/latest/
Other
26 stars 13 forks source link

fjage and Fjage.jl startup stages don't agree #275

Open ettersi opened 1 year ago

ettersi commented 1 year ago

The Fjåge MasterContainer sends out an "{\"alive\": true}" message once initialisation completed.

https://github.com/org-arl/fjage/blob/c53ff1df94ef159e355aaf50c6984df3c33c3aa3/src/main/java/org/arl/fjage/remote/MasterContainer.java#L347-L352

https://github.com/org-arl/fjage/blob/c53ff1df94ef159e355aaf50c6984df3c33c3aa3/src/main/java/org/arl/fjage/remote/ConnectionHandler.java#L75-L98

However, Container.send() only forwards messages after the container has started.

https://github.com/org-arl/fjage/blob/c53ff1df94ef159e355aaf50c6984df3c33c3aa3/src/main/java/org/arl/fjage/remote/MasterContainer.java#L220-L221

This is a problem because Fjage.jl interprets the "{\"alive\": true}" messages as a signal that the master container is now ready to accept messages.

https://github.com/org-arl/Fjage.jl/blob/b109126034cbdc517d8003e783df890677657002/src/gw.jl#L145-L148 https://github.com/org-arl/Fjage.jl/blob/b109126034cbdc517d8003e783df890677657002/src/container.jl#L468-L476

The result is that messages sent by slave-container agents may not reach their destination if these messages happen to fall in the gap between init() and start() on the master container.

Demonstration:

using Fjage

@agent struct Dummy; end
function Fjage.startup(agent::Dummy)
    node = agentforservice(agent, "org.arl.unet.Services.NODE_INFO")
    # Shouldn't trigger, but does
    @assert !isnothing(node.address)
end

simulator = run(`bin/unet samples/2-node-network.groovy`, wait = false)
try
    container = SlaveContainer("localhost", 1101, reconnect = false)
    add(container, Dummy())
    while true
        try
            start(container)
            break
        catch e
            sleep(0.01)
        end
    end
    sleep(3.0)
finally
    kill(simulator)
end