FWuellhorst commented 2 years ago

While trying to terminate single agents or full MAS, I found that deleting something does not work.

Two examples:

1. Deleting an agent.

Current behaviour

I put the request requests.delete("localhost:30009/api/clonemap/mas/2/agents/4") and get a response 200. The corresponding agent received the request: 2022-01-11 07:40:01,106 - [INFO] - Agency: Received Request: DELETE /api/agency/agents/4 Afterwards, I can still access http://localhost:30009/api/clonemap/mas/2/agents/4 and get all data. Also, the docker container is not stopped.

Expected behaviour

The docker container is stopped and the agent with id 4 is deleted.

2. Deleting an MAS.

Current behaviour

I put the request requests.delete("localhost:30009/api/clonemap/mas/0") and get a response 200. Before this request, running docker ps -a yields

CONTAINER ID   IMAGE                                                                      COMMAND                  CREATED              STATUS              PORTS                                         NAMES
d609977c113c   testdizwionclonemap                                                        "python -u agentlib/…"   9 seconds ago        Up 8 seconds                                                      mas-0-im-0-agency-5.mas0agencies
1357070d0d84   testdizwionclonemap                                                        "python -u agentlib/…"   10 seconds ago       Up 9 seconds                                                      mas-0-im-0-agency-4.mas0agencies
19410bb7b002   testdizwionclonemap                                                        "python -u agentlib/…"   10 seconds ago       Up 10 seconds                                                     mas-0-im-0-agency-3.mas0agencies
2c8b3ebd44d9   testdizwionclonemap                                                        "python -u agentlib/…"   11 seconds ago       Up 11 seconds                                                     mas-0-im-0-agency-2.mas0agencies
147a352a9b38   testdizwionclonemap                                                        "python -u agentlib/…"   14 seconds ago       Up 13 seconds                                                     mas-0-im-0-agency-1.mas0agencies
d08e9659a79c   testdizwionclonemap                                                        "python -u agentlib/…"   15 seconds ago       Up 14 seconds                                                     mas-0-im-0-agency-0.mas0agencies
3335807b57f4   eclipse-mosquitto:1.6.13                                                   "/docker-entrypoint.…"   About a minute ago   Up About a minute   0.0.0.0:30883->1883/tcp, :::30883->1883/tcp   clonemap_mqtt_1
2d25e956e033   registry.git.rwth-aachen.de/acs/public/cloud/mas/clonemap/ams              "./ams"                  About a minute ago   Up About a minute   0.0.0.0:30009->9000/tcp, :::30009->9000/tcp   clonemap_ams_1
9d106a3448d4   registry.git.rwth-aachen.de/acs/public/cloud/mas/clonemap/clonemap_local   "docker-entrypoint.s…"   About a minute ago   Up About a minute   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp     kubestub

The MAS logs the following (I also added two GET requests):

kubestub    | Received Request: DELETE /api/container/0
mqtt_1      | 1641887899: Socket error on client auto-8C05EB97-C8E6-E789-1157-8A4A1424425B, disconnecting.
mqtt_1      | 1641887902: Socket error on client auto-B97DC532-C70A-32B5-5673-5915E5B0AFBB, disconnecting.
mqtt_1      | 1641887904: Socket error on client auto-DFE8AD9A-6AEE-7665-8904-77E3931471C9, disconnecting.
mqtt_1      | 1641887907: Socket error on client auto-3A860F08-4444-28F1-5EF2-A13277D1F795, disconnecting.
ams_1       | [INFO] 2022/01/11 07:59:57 Received Request:  GET   /api/clonemap/mas/1
ams_1       | [ERROR] 2022/01/11 07:59:57 /api/clonemap/mas/1 MAS does not exist
ams_1       | [INFO] 2022/01/11 08:00:00 Received Request:  GET   /api/clonemap/mas/0

docker ps -a yields the correct statement, i.e. all agent containers are removed:

CONTAINER ID   IMAGE                                                                      COMMAND                  CREATED         STATUS         PORTS                                         NAMES
3335807b57f4   eclipse-mosquitto:1.6.13                                                   "/docker-entrypoint.…"   2 minutes ago   Up 2 minutes   0.0.0.0:30883->1883/tcp, :::30883->1883/tcp   clonemap_mqtt_1
2d25e956e033   registry.git.rwth-aachen.de/acs/public/cloud/mas/clonemap/ams              "./ams"                  2 minutes ago   Up 2 minutes   0.0.0.0:30009->9000/tcp, :::30009->9000/tcp   clonemap_ams_1
9d106a3448d4   registry.git.rwth-aachen.de/acs/public/cloud/mas/clonemap/clonemap_local   "docker-entrypoint.s…"   2 minutes ago   Up 2 minutes   0.0.0.0:8000->8000/tcp, :::8000->8000/tcp     kubestub

However, the GET request for MAS 0 still gives me all the data. Also, if I post a new MAS, the ID 0 is not used but instead, ID 1 is created. I think this results from the fact that GET does not return MAS does not exist.

Expected behaviour

The MAS is fully deleted and I can re-use ID 0

@kwe712 @s-daehling : Could you comment on this error and tell me if it's a bug or actually a feature?

As you can see, I am running it locally and using the following docker-compose:

version: "3.7"

services:
  kubestub:
    container_name: kubestub
    hostname: kubestub
    image: registry.git.rwth-aachen.de/acs/public/cloud/mas/clonemap/clonemap_local
    environment:
      CLONEMAP_LOG_LEVEL: ${CLONEMAP_LOG_LEVEL}
    volumes:
     - /var/run/docker.sock:/var/run/docker.sock
    ports:
     - 8000:8000
    networks:
     - clonemap-net
    stop_grace_period: 30s

  ams:
    image: registry.git.rwth-aachen.de/acs/public/cloud/mas/clonemap/ams
    environment:
      CLONEMAP_DEPLOYMENT_TYPE: ${CLONEMAP_DEPLOYMENT_TYPE}
      CLONEMAP_STORAGE_TYPE: ${CLONEMAP_STORAGE_TYPE}
      CLONEMAP_LOG_LEVEL: ${CLONEMAP_LOG_LEVEL}
      CLONEMAP_STUB_HOSTNAME: ${CLONEMAP_STUB_HOSTNAME}
    ports:
     - 30009:9000
    depends_on:
      - kubestub
    networks:
      - clonemap-net

  mqtt:
    image: eclipse-mosquitto:1.6.13
    ports:
      - 30883:1883
    depends_on:
      - kubestub
      - ams
    networks:
      - clonemap-net

networks:
  clonemap-net:
    name: clonemap-net
    attachable: true

with .env

CLONEMAP_STORAGE_TYPE=local
CLONEMAP_DEPLOYMENT_TYPE=local
# mqtt

CLONEMAP_LOG_LEVEL=info
# fe kubestub

CLONEMAP_STUB_HOSTNAME=kubestub

CLONEMAP_MODULE_MQTT=true
CLONEMAP_MODULE_FRONTEND=true

s-daehling commented 2 years ago

The behavior you describe is actually intended. Deleting an agent or entire MAS terminates the agents but it does not remove the data. However, the status field should show a different status code (this is not well documented yet). The AMS does not delete the data since this might cause problems with other modules. For example, the user might still want to access log messages from agents or MASs after they have been terminated, e.g. for debugging. If the AMS would delete the data and reuse the ID, there is no way to tell which MAS actually created the logs. New logs would simply be appended to the old ones under the same MAS ID. That is why the data is still available, the status code is changed to show that the MAS is terminated and the next MAS to be started gets a new ID. Similar considerations are true for single agents.

I would suggest the following:

Improve documentation to clarify that data is not deleted when terminating agents and MAS
Add documentation for status codes
Add a new API endpoint to reset clonemap, i.e. really delete all data and start from ID 0 again. This would correspond to the behavior you expected.

What do you think about this?

FWuellhorst commented 2 years ago

Thanks for the quick response! That makes sense, I did not thought about the logging and debugging side. But: Terminating/deleting one agent alone does not work. Is that intended as well?

I approve of your points, especially point 3. One note, however: I would use different naming. Reset indicates "start the MAS/Agent with the exact same config", whereas a user maybe wants to debug an agent, update the config and then restart it. From your definition, a reset would mean "delete agent and config data", or? Two options to get a clear naming, this way the docs don't need to be that explicit:

Name the current DELETE -> TERMINATE and your mentioned RESET -> DELETE
Just keep DELETE, but enable a curl DELETE mas\0\config and mas\0\agents\1\config to delete the data as well.

s-daehling commented 2 years ago

I will look into the problem with deleting single agents. Maybe this is a problem with clonemapy not cloneMAP itself.

I can use the naming you suggest for the docu, but we have to be careful not to confuse it with the http methods, since there are no methds TERMINATE or RESET. Regarding the reset, I meant that a reset would be applied to the entire platform, not only one MAS. Reseting cloneMAP would terminate all MASs and delete all their data (in all modules), so that a user can continue as if the platform is started fresh.

I am still not sure if terminating and deleting data of a single MAS is a good idea. For example: Lets say we have four MASs: 0, 1, 2 and 3. Now we terminated and delete MAS 1. After that we start a new MAS. Should it have ID 1 or ID 4? Either way it might be unexpected behavior to the user. Keeping the data until the platform is stopped or reset, makes the behavior more clear, I think.

I will start with the implementation of a platform reset function as described. If this is not sufficient for your usecase, we can discuss again.

FWuellhorst commented 2 years ago

I agree regarding the reset. Maybe we then need to look at the name and Id coupling with the agentlib. If I restart an agent, it's not possible as the name already exists. Regarding clonemapy: I will look into it and create an according issue. Debugging python will be no problem.

FWuellhorst commented 2 years ago

After fixing the underlying bug in clonemapy (https://github.com/RWTH-ACS/clonemapy/issues/1), the agent stops. However, the docker container with the agency is still running. To match my requirements (delete and re-start single agents for monitoring) and looking at the discussion, I would need only one feature: Delete an agent (works) and stop the agency if no agents are present (feature in clonemapy?, only optional to prevent empty docker containers). Keep the data but enable posting new agents with the same name. If this is not possible due to design, deleting the existing data would be necessary.

Regarding resetting the whole MAS: This is not required from my side. This can be already achieved by composing down and up again.

s-daehling commented 2 years ago

I have merged your fix in clonemapy into the develop branch. Thanks :)

Regarding your feature requests:

Delete an agent and stop agency: Deleting agents should work now. Empty agencies are currently not stopped. This is actually not easy to implement because, when deployed with k8s, cloneMAP uses StatefulSets for the agencies. The agency ID of cloneMAP then matches the ID of the container in the StatefulSet. Container IDs in k8s StatefulSets are generated from a counter. Thus the naming of the containers: e.g. mas-0-im-0-agency-0:n.mas0agencies In a StatefulSet it is not possible to stop specific containers. We can only reduce the number of containers, i.e. scale down. If we scale down by one container, k8s will remove the container with the highest ID. After that we have the following containers: mas-0-im-0-agency-0:n-1.mas0agencies. If we have an empty agency in the middle of the StatefulSet, it is not possible to stop this specific agency container. You are using cloneMAP with docker-compose. Here it would be possible to implement the stopping of agencies. However, I would like to keep the behavior of cloneMAP the same, independent from the deployment method. Hence, I would prefer not to implement this. However, I addressed a bug in #5 . Now, if you delete an agent from an agency, and afterwards start a new agent, this new agent is scheduled to the existing agency. So, empty or partially-empty agencies are reused by new agents. For example, let's say we have four agents and a maximum of two agents per agency. Then agents 0 and 1 are in agency 0. And agent 2 and 3 are in agency 1. Now we delete agent 1. After that we create a new agent, which gets the ID 4. This agent will then be scheduled to agency 0. Before the fix, a new agency would have been created for the agent. It is not yet merged into develop, but you can test it using the latest tag.

Keep the data but enable posting new agents with the same name: This should already be possible. cloneMAP does not care about uniqueness of the agent name. I successfully tried to start several agents with the same name. If it is not possible, please let me know.

resetting the whole MAS: Since you do not require this feature, I will not implement it yet, but keep it in mind as potential feature in the future.

FWuellhorst commented 2 years ago

Keep the data but enable posting new agents with the same name: This works for me. resetting the whole MAS: Sound good to me. Delete an agent and stop agency: The solution in #5 seems to be more than sufficient. My only goal is to be able to delete and restart agents without spawning inf agencies.

s-daehling commented 2 years ago

I just merged #6 and closed #5

@FWuellhorst Can I close this issue?

FWuellhorst commented 2 years ago

Yes, thanks for the quick fix!

FWuellhorst commented 1 year ago

@s-daehling @kwe712 Coming to this issue one more time for a follow up question: If it is not possible to delete an MAS, can I at least check if the MAS is alive? I did not find a specific endpoint in the open_api.yml. Is this even possible?

sogno-platform / clonemap

curl DELETE does not delete anything #4

1. Deleting an agent.

Current behaviour

Expected behaviour

2. Deleting an MAS.

Current behaviour

Expected behaviour