Closed FWuellhorst closed 2 years ago
The behavior you describe is actually intended. Deleting an agent or entire MAS terminates the agents but it does not remove the data. However, the status field should show a different status code (this is not well documented yet). The AMS does not delete the data since this might cause problems with other modules. For example, the user might still want to access log messages from agents or MASs after they have been terminated, e.g. for debugging. If the AMS would delete the data and reuse the ID, there is no way to tell which MAS actually created the logs. New logs would simply be appended to the old ones under the same MAS ID. That is why the data is still available, the status code is changed to show that the MAS is terminated and the next MAS to be started gets a new ID. Similar considerations are true for single agents.
I would suggest the following:
What do you think about this?
Thanks for the quick response! That makes sense, I did not thought about the logging and debugging side. But: Terminating/deleting one agent alone does not work. Is that intended as well?
I approve of your points, especially point 3. One note, however: I would use different naming. Reset indicates "start the MAS/Agent with the exact same config", whereas a user maybe wants to debug an agent, update the config and then restart it. From your definition, a reset would mean "delete agent and config data", or? Two options to get a clear naming, this way the docs don't need to be that explicit:
mas\0\config
and mas\0\agents\1\config
to delete the data as well.I will look into the problem with deleting single agents. Maybe this is a problem with clonemapy not cloneMAP itself.
I can use the naming you suggest for the docu, but we have to be careful not to confuse it with the http methods, since there are no methds TERMINATE or RESET. Regarding the reset, I meant that a reset would be applied to the entire platform, not only one MAS. Reseting cloneMAP would terminate all MASs and delete all their data (in all modules), so that a user can continue as if the platform is started fresh.
I am still not sure if terminating and deleting data of a single MAS is a good idea. For example: Lets say we have four MASs: 0, 1, 2 and 3. Now we terminated and delete MAS 1. After that we start a new MAS. Should it have ID 1 or ID 4? Either way it might be unexpected behavior to the user. Keeping the data until the platform is stopped or reset, makes the behavior more clear, I think.
I will start with the implementation of a platform reset function as described. If this is not sufficient for your usecase, we can discuss again.
I agree regarding the reset. Maybe we then need to look at the name and Id coupling with the agentlib. If I restart an agent, it's not possible as the name already exists. Regarding clonemapy: I will look into it and create an according issue. Debugging python will be no problem.
After fixing the underlying bug in clonemapy (https://github.com/RWTH-ACS/clonemapy/issues/1), the agent stops. However, the docker container with the agency is still running. To match my requirements (delete and re-start single agents for monitoring) and looking at the discussion, I would need only one feature: Delete an agent (works) and stop the agency if no agents are present (feature in clonemapy?, only optional to prevent empty docker containers). Keep the data but enable posting new agents with the same name. If this is not possible due to design, deleting the existing data would be necessary.
Regarding resetting the whole MAS: This is not required from my side. This can be already achieved by composing down and up again.
I have merged your fix in clonemapy into the develop branch. Thanks :)
Regarding your feature requests:
Delete an agent and stop agency: Deleting agents should work now. Empty agencies are currently not stopped. This is actually not easy to implement because, when deployed with k8s, cloneMAP uses StatefulSets for the agencies. The agency ID of cloneMAP then matches the ID of the container in the StatefulSet. Container IDs in k8s StatefulSets are generated from a counter. Thus the naming of the containers: e.g. mas-0-im-0-agency-0:n.mas0agencies In a StatefulSet it is not possible to stop specific containers. We can only reduce the number of containers, i.e. scale down. If we scale down by one container, k8s will remove the container with the highest ID. After that we have the following containers: mas-0-im-0-agency-0:n-1.mas0agencies. If we have an empty agency in the middle of the StatefulSet, it is not possible to stop this specific agency container. You are using cloneMAP with docker-compose. Here it would be possible to implement the stopping of agencies. However, I would like to keep the behavior of cloneMAP the same, independent from the deployment method. Hence, I would prefer not to implement this. However, I addressed a bug in #5 . Now, if you delete an agent from an agency, and afterwards start a new agent, this new agent is scheduled to the existing agency. So, empty or partially-empty agencies are reused by new agents. For example, let's say we have four agents and a maximum of two agents per agency. Then agents 0 and 1 are in agency 0. And agent 2 and 3 are in agency 1. Now we delete agent 1. After that we create a new agent, which gets the ID 4. This agent will then be scheduled to agency 0. Before the fix, a new agency would have been created for the agent. It is not yet merged into develop, but you can test it using the latest tag.
Keep the data but enable posting new agents with the same name: This should already be possible. cloneMAP does not care about uniqueness of the agent name. I successfully tried to start several agents with the same name. If it is not possible, please let me know.
resetting the whole MAS: Since you do not require this feature, I will not implement it yet, but keep it in mind as potential feature in the future.
Keep the data but enable posting new agents with the same name: This works for me. resetting the whole MAS: Sound good to me. Delete an agent and stop agency: The solution in #5 seems to be more than sufficient. My only goal is to be able to delete and restart agents without spawning inf agencies.
I just merged #6 and closed #5
@FWuellhorst Can I close this issue?
Yes, thanks for the quick fix!
@s-daehling @kwe712 Coming to this issue one more time for a follow up question: If it is not possible to delete an MAS, can I at least check if the MAS is alive? I did not find a specific endpoint in the open_api.yml. Is this even possible?
While trying to terminate single agents or full MAS, I found that deleting something does not work.
Two examples:
1. Deleting an agent.
Current behaviour
I put the request
requests.delete("localhost:30009/api/clonemap/mas/2/agents/4")
and get a response 200. The corresponding agent received the request:2022-01-11 07:40:01,106 - [INFO] - Agency: Received Request: DELETE /api/agency/agents/4
Afterwards, I can still accesshttp://localhost:30009/api/clonemap/mas/2/agents/4
and get all data. Also, the docker container is not stopped.Expected behaviour
The docker container is stopped and the agent with id 4 is deleted.
2. Deleting an MAS.
Current behaviour
I put the request
requests.delete("localhost:30009/api/clonemap/mas/0")
and get a response 200. Before this request, runningdocker ps -a
yieldsThe MAS logs the following (I also added two GET requests):
docker ps -a
yields the correct statement, i.e. all agent containers are removed:However, the GET request for MAS 0 still gives me all the data. Also, if I post a new MAS, the ID 0 is not used but instead, ID 1 is created. I think this results from the fact that GET does not return
MAS does not exist
.Expected behaviour
The MAS is fully deleted and I can re-use ID 0
@kwe712 @s-daehling : Could you comment on this error and tell me if it's a bug or actually a feature?
As you can see, I am running it locally and using the following docker-compose:
with .env