kytos-ng / kytos

Kytos SDN Platform. Kytos is designed to be easy to install, use, develop and share Network Apps (NApps).
https://kytos-ng.github.io/
MIT License
3 stars 7 forks source link

Upgraded Mongo to 7.0 from 5.0 #470

Closed Alopalao closed 4 months ago

Alopalao commented 5 months ago

Closes #451

Summary

Updated MongoDB to 7.0.

Local Tests

Updated MongoDB and run some tests. No noticeable changes between the versions and data generation.

Update steps

Before following the update procedure, be sure to download the latest Mongo package. To check on active database mongosh can be installed (For Debian 12 follow Ubuntu 22.04 process). To update follow this instructions:

  1. Stop kytos

  2. From mongosh, enter db.adminCommand( { setFeatureCompatibilityVersion: "5.0" } ). This change can be checked with db.adminCommand( { getParameter: 1, featureCompatibilityVersion: 1 } )

  3. Stop the containers that kytos creates: mongo-rs-init, mongo1, mongo2 and mongo3. To check their status enter the console command docker ps -a. Example:

    CONTAINER ID   IMAGE       COMMAND                  CREATED              STATUS                          PORTS                                                      NAMES
    493d530e4104   mongo:6.0   "/scripts/rs-init.sh"    About a minute ago   Exited (0) About a minute ago                                                              mongo-rs-init
    ad7a9149a38c   mongo:6.0   "/usr/bin/mongod --b…"   2 minutes ago        Up About a minute               0.0.0.0:27017->27017/tcp, :::27017->27017/tcp              mongo1
    fbec48992a5d   mongo:6.0   "/usr/bin/mongod --b…"   2 minutes ago        Up About a minute               27017/tcp, 0.0.0.0:27018->27018/tcp, :::27018->27018/tcp   mongo2
    607ee9359347   mongo:6.0   "/usr/bin/mongod --b…"   2 minutes ago        Up About a minute               27017/tcp, 0.0.0.0:27019->27019/tcp, :::27019->27019/tcp   mongo3

    Stop each container with docker stop $CONTAINER_ID. E.g. docker stop 493d530e4104.

  4. Change kytos/docker-compose.yml mongo images to 6.0 (There are 4).

  5. Run Kytos and close Kytos. Or just run docker compose up -d.

  6. To mongosh, enter db.adminCommand( { setFeatureCompatibilityVersion: "6.0" } )

  7. Stop the containers that kytos creates

  8. Change kytos/docker-compose.yml mongo images to 7.0

  9. Run Kytos or docker compose up -d. The result can be verified with mongosh with the command mongosh mongo1:27017,mongo2:27018,mongo3:27019.

    Current Mongosh Log ID: 6632a69a07964f2a047b2da8
    Connecting to:          mongodb://127.0.0.1:27017/mongo1%3A27017%2Cmongo2%3A27018%2Cmongo3%3A27019?directConnection=true&serverSelectionTimeoutMS=2000&appName=mongosh+2.2.2
    Using MongoDB:          7.0.8  # <-- HERE
    Using Mongosh:          2.2.2
    mongosh 2.2.5 is available for download: https://www.mongodb.com/try/download/shell

Discrepancies

Kytos does not pass "hello" command on MongoDB

There maybe some problems with the version compatibility. To check this you can check the mongo1 container. Run docker logs $CONTAINER_ID | grep "Invalid feature compatibility version" to check settings. In the results check for:

Invalid feature compatibility version value '5.0'; expected '6.0' or '6.3' or '7.0'.

The compatibility version is set to 5.0 which means that the version kytos/docker-compose.yml needs to change to either 5.0 or 6.0 (in the case of 6.0, the containers are compatible with 6.0 and 7.0). Change docker-compose.yml accordingly, stop and delete containers. Finally run kytos or execute docker compose up -d and continue with the update process

APM not working

Stop and delete the containers created from docker-compose.es.yml. And create them again.

Problems with creating containers after docker compose up -d

The problems could be errors while recreating container or a container ID already in use. Stop and delete the containers created from docker-compose.yml.

Port 27017 or 8181 is already in use

Downgrade

Downgrading can be done depending on the compatibility version on the database.:

Before executing each time the containers with different version, stop the containers mongo1, mongo2 and mongo3 with the previous version.

Save and restore database

Command to export:

mongodump -d <database_name> -o <directory_backup>

It will create a folder with the database name, by default kytos database name is napps

Command to restore:

mongorestore -d <database_name> <directory_backup>

End-to-End Tests

The e2e tests depend on this PR

+ python3 -m pytest tests/ --reruns 2 -r fEr
============================= test session starts ==============================
platform linux -- Python 3.11.2, pytest-8.1.1, pluggy-1.5.0
rootdir: /tests
plugins: rerunfailures-13.0, timeout-2.2.0, anyio-4.3.0
collected 263 items

tests/test_e2e_01_kytos_startup.py ..                                    [  0%]
tests/test_e2e_05_topology.py ..................                         [  7%]
tests/test_e2e_06_topology.py ....                                       [  9%]
tests/test_e2e_10_mef_eline.py ..........ss.....x.....x................  [ 24%]
tests/test_e2e_11_mef_eline.py ......                                    [ 26%]
tests/test_e2e_12_mef_eline.py .....Xx.                                  [ 29%]
tests/test_e2e_13_mef_eline.py ....Xs.s.....Xs.s.XXxX.xxxx..X........... [ 45%]
.                                                                        [ 45%]
tests/test_e2e_14_mef_eline.py x                                         [ 46%]
tests/test_e2e_15_mef_eline.py .....                                     [ 47%]
tests/test_e2e_16_mef_eline.py .                                         [ 48%]
tests/test_e2e_20_flow_manager.py .....................                  [ 56%]
tests/test_e2e_21_flow_manager.py ...                                    [ 57%]
tests/test_e2e_22_flow_manager.py ...............                        [ 63%]
tests/test_e2e_23_flow_manager.py ..............                         [ 68%]
tests/test_e2e_30_of_lldp.py ....                                        [ 69%]
tests/test_e2e_31_of_lldp.py ...                                         [ 71%]
tests/test_e2e_32_of_lldp.py ...                                         [ 72%]
tests/test_e2e_40_sdntrace.py ..............                             [ 77%]
tests/test_e2e_41_kytos_auth.py ........                                 [ 80%]
tests/test_e2e_42_sdntrace.py ..                                         [ 81%]
tests/test_e2e_50_maintenance.py ............................            [ 92%]
tests/test_e2e_60_of_multi_table.py .....                                [ 93%]
tests/test_e2e_70_kytos_stats.py ........                                [ 96%]
tests/test_e2e_80_pathfinder.py ss......                                 [100%]
italovalcy commented 4 months ago

Hi team,

Great work, Aldo! very nice done!

  • Excellent to see e2e passing. On GitLab https://gitlab.ampath.net/kytos/kytos-end-to-end-tester/-/blob/master/.gitlab-ci.yml needs to be updated too, check out the services there on 5.0. We also need to decide on a strategy, if we'll upgrade there to 7.0, and then upgrade amlight/kytos-end-to-end-tester on 7.0 to be able to run e2e on demand, but then we can maybe also create a new repo copy of amlight/kytos-end-to-end-tester to still have it on 5.0 just so if any patch on 2024.1 still needs to be patched then we can also test on 5.0 in the CI? and then we maintain the 5.0 there until prod upgrades to 7.0 when they deploy the future version 2024.2 probably by the end of the year? Let me know if you have any other suggestions. In practice, no expected issues to also test future patches of 2024.1 with 7.0, but for completeness let's be ready for that case too. This also needs to include @italovalcy in the conversation.

Another approach would be having both tests on the end-to-end daily execution (5.0 and 7.0), just to make sure no surprises will be detected.

viniarck commented 4 months ago

Hi team,

Great work, Aldo! very nice done!

  • Excellent to see e2e passing. On GitLab https://gitlab.ampath.net/kytos/kytos-end-to-end-tester/-/blob/master/.gitlab-ci.yml needs to be updated too, check out the services there on 5.0. We also need to decide on a strategy, if we'll upgrade there to 7.0, and then upgrade amlight/kytos-end-to-end-tester on 7.0 to be able to run e2e on demand, but then we can maybe also create a new repo copy of amlight/kytos-end-to-end-tester to still have it on 5.0 just so if any patch on 2024.1 still needs to be patched then we can also test on 5.0 in the CI? and then we maintain the 5.0 there until prod upgrades to 7.0 when they deploy the future version 2024.2 probably by the end of the year? Let me know if you have any other suggestions. In practice, no expected issues to also test future patches of 2024.1 with 7.0, but for completeness let's be ready for that case too. This also needs to include @italovalcy in the conversation.

Another approach would be having both tests on the end-to-end daily execution (5.0 and 7.0), just to make sure no surprises will be detected.

Good idea having two parallel pipelines on end-to-end project nightly on 7.0 and on 5.0. Let's also consider this, if the available GitLab workers can pick up the work in parallel, and then we maintain it until prod gets upgraded to 7.0.

Alopalao commented 4 months ago

Test description:

Overall Results

Timing measure for bulk_write cumulative_times

Database created and tests performed in mongo 5.0:

Overview

Overview

Trace

Trace

Request POST flow_manager install flows

Transaction_re_install

Request POST flow_manager delete flows

Transaction_re_delete

Database created and tests performed in mongo 7.0:

Overview

Overviewpng

Trace

Traces

Request POST flow_manager install flows

Transaction_request

Request POST flow_manager delete flows

Transaction_request_delete

Database created in mongo 5.0 and tests performed in mongo 7.0:

Overview

Overview

Trace

traces

Request POST flow_manager install flows

Trans_install

Request POST flow_manager delete flows

Trans_delete

viniarck commented 4 months ago

Great to see that overall the DB ops latencies are similar as we'd expect, that's a good start. On flow_manager delete if you could also include in the screenshot how long the DB operation was taking (on 5.0 you included that correctly, but that was truncated on 7.0 screenshot):

20240510_091252

I'm just waiting now the the rest of the points on this comment to be addressed to approve it

@Alopalao if you could also post the output of vegeta for one of the endpoint stress tests for the record on both Mongo 5 and 7 results that'd be great, vegeta output is very helpful since it includes many stats including some percentiles. So, that of couse it's only from a client point of view, but if the overall latencies were similar, that's already a great sign

Alopalao commented 4 months ago

Vegeta report results from jq -ncM '{method: "POST", url: "http://localhost:8181/api/kytos/flow_manager/v2/flows/00:00:00:00:00:00:00:01", body: { "force": true, "flows": [ { "priority": 10, "match": { "in_port": 1, "dl_vlan": 100 }, "actions": [ { "action_type": "output", "port": 1 } ] } ] } | @base64, header: {"Content-Type": ["application/json"]}}' | vegeta attack -format=json -rate=200/1s -duration=60s -timeout=120s | tee results.bin | vegeta report

Mongo 7.0

Requests      [total, rate, throughput]         12000, 200.02, 148.80
Duration      [total, attack, wait]             1m1s, 59.994s, 603.263ms
Latencies     [min, mean, 50, 90, 95, 99, max]  10.2ms, 507.339ms, 519.413ms, 773.648ms, 848.971ms, 996.941ms, 1.239s
Bytes In      [total, mean]                     381289, 31.77
Bytes Out     [total, mean]                     1464000, 122.00
Success       [ratio]                           75.14%
Status Codes  [code:count]                      202:9017  503:2983
Error Set:
503 Service Unavailable

apm

Mongo 5.0

Requests      [total, rate, throughput]         12000, 200.02, 143.44
Duration      [total, attack, wait]             1m1s, 59.995s, 692.008ms
Latencies     [min, mean, 50, 90, 95, 99, max]  7.965ms, 509.928ms, 483.188ms, 831.703ms, 906.246ms, 1.101s, 1.267s
Bytes In      [total, mean]                     375985, 31.33
Bytes Out     [total, mean]                     1464000, 122.00
Success       [ratio]                           72.54%
Status Codes  [code:count]                      202:8705  503:3295
Error Set:
503 Service Unavailable

Screenshot_20240513_123936

The error from both tests come from concurrency limit: 2024-05-13 12:38:44,102 - WARNING [uvicorn.error] (MainThread) Exceeded concurrency limit.