xenserver / status-report

Program that gathers data for xenserver host diagnostics
GNU Lesser General Public License v2.1
1 stars 9 forks source link

CP-41238: To save RRDs before capturing files, run commands before collecting directories #51

Closed bernhardkaindl closed 9 months ago

bernhardkaindl commented 10 months ago

RRDs are only synced to disk periodically by Xapi to avoid too much wear, and in most cases, bug report zip archives don't contain anything useful if taking a status report immediately after an issue occurs.

This PR allows syncing the current RRDs to disk by running the rrd-cli save_rrds command during the bug tool execution and then collect the updated RRD files within the same status report call.

For this, xen-api added the command rrd-cli save_rrds to its bugtool plugin in /etc/xensource/bugtool/xapi/stuff.xml:

<collect>
...
<command label="save_rrd">rrd-cli save_rrds</command>
</collect>

The assumption likely was:

  1. Bugtool will run the command rrd-cli save_rrds as defined in the xen-api bugtool xml file (see the code above).
  2. After this, bugtool might then read the created and/or updated rrd files.

But the implementation of bugtool is the other way around. Currently, it does this:

  1. It looks for which files it can find and records the paths to the files it has found to exist.
  2. It reads the files which it found in pass 1 and stores them in memory
  3. In pass 3, it executes commands defined by xapi in the xapi bugtool file/plugin.

The bugtool Requirements for the fix may be:

  1. Execute rrd-cli save_rrds (or the commands from the plugins such as the xapi plugins at least)
  2. This must happen before the directory tree(s) (in this case at least /var/xapi/blobs) are recursively traversed to look for files.
  3. This is the pre-condition for rrd-cli to run before the rrd directory ist listed. Only after that the tree /var/xapi/blobs may be collected.
github-actions[bot] commented 10 months ago

Coverage

Pytest Code coverage comment for Python 2.7
FileStmtsMissCoverMissing
xen-bugtool154829281%77–79, 84, 95, 100–101, 410, 487, 492, 512, 518, 541, 676, 694, 759–760, 774, 778, 793–794, 800–801, 806, 816, 825, 828–831, 837, 842–844, 847–848, 851–852, 855, 896–898, 1033–1047, 1049–1058, 1119, 1127, 1248, 1250, 1254–1255, 1278, 1296, 1299–1301, 1304–1308, 1311–1312, 1316–1318, 1324–1330, 1333–1337, 1339–1355, 1358–1366, 1369–1370, 1372, 1374–1375, 1377–1385, 1388, 1390–1393, 1395, 1397–1398, 1432, 1460, 1475, 1483–1484, 1488–1490, 1492, 1499, 1501–1505, 1507, 1537–1539, 1541–1546, 1549–1553, 1557–1558, 1560, 1577–1578, 1580, 1582–1583, 1585–1587, 1589–1596, 1598–1603, 1607–1608, 1610, 1612, 1614, 1630–1631, 1633, 1635–1637, 1640–1641, 1643, 1645–1647, 1650–1651, 1653, 1655–1658, 1661–1671, 1674–1678, 1687, 1700, 1710, 1714, 1749–1751, 1826, 1861, 1900, 1960, 2003, 2006–2007, 2045, 2052, 2058, 2060–2063, 2066–2076, 2167–2169, 2172–2174, 2186, 2205–2207, 2247–2251, 2263, 2281
integration
   __init__.py00100% 
   conftest.py180100% 
   namespace_container.py35197%85
   test_system_load.py180100% 
   test_xenserverconfig.py130100% 
   utils.py63592%18, 31, 67, 70, 108
mocks/xen
   __init__.py00100% 
mocks/xen/lowlevel
   __init__.py00100% 
mocks/xen/lowlevel/xc
   __init__.py50100% 
unit
   __init_\.py00100% 
   conftest.py28967%34, 36–40, 42–44
   test_dir_list.py110100% 
   test_fs_funcs.py100100% 
   test_load_plugins.py90100% 
   test_main.py115893%187, 264–266, 268–271
   test_output.py650100% 
   test_process_output.py110100% 
   test_xapidb_filter.py190100% 
TOTAL196831583% 

Tests Skipped Failures Errors Time
20 1 :zzz: 0 :x: 0 :fire: 2.199s :stopwatch:
codecov-commenter commented 10 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (b0cd013) 85.28% compared to head (aa47bef) 85.46%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #51 +/- ## ========================================== + Coverage 85.28% 85.46% +0.17% ========================================== Files 15 15 Lines 1950 1974 +24 ========================================== + Hits 1663 1687 +24 Misses 287 287 ``` | [Flag](https://app.codecov.io/gh/xenserver/status-report/pull/51/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=xenserver) | Coverage Δ | | |---|---|---| | [python2.7](https://app.codecov.io/gh/xenserver/status-report/pull/51/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=xenserver) | `83.99% <100.00%> (+0.19%)` | :arrow_up: | | [python3.10.13](https://app.codecov.io/gh/xenserver/status-report/pull/51/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=xenserver) | `84.41% <100.00%> (+0.20%)` | :arrow_up: | | [unittest](https://app.codecov.io/gh/xenserver/status-report/pull/51/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=xenserver) | `?` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=xenserver#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

github-actions[bot] commented 10 months ago

Coverage

Python3 coverage comment from https://github.com/marketplace/actions/pytest-coverage-comment
FileStmtsMissCoverMissing
__init__.py00100% 
conftest.py320100% 
test_dir_list.py110100% 
test_fs_funcs.py100100% 
test_load_plugins.py90100% 
test_main.py1170100% 
test_output.py650100% 
test_process_output.py110100% 
test_xapidb_filter.py190100% 
xen-bugtool154828481%95, 100–101, 410, 492, 512, 518, 541, 676, 694, 759–760, 774, 778, 793–794, 800–801, 806, 816, 825, 837, 842–844, 847–848, 851–852, 855, 896–898, 917, 1033–1047, 1049–1058, 1119, 1127, 1248, 1250, 1254–1255, 1278, 1296, 1299–1301, 1304–1308, 1311–1312, 1316–1318, 1324–1330, 1333–1337, 1339–1355, 1358–1366, 1369–1370, 1372, 1374–1375, 1377–1385, 1388, 1390–1393, 1395, 1397–1398, 1432, 1460, 1475, 1483–1484, 1488–1490, 1492, 1499, 1501–1505, 1507, 1537–1539, 1541–1546, 1549–1553, 1557–1558, 1560, 1577–1578, 1580, 1582–1583, 1585–1587, 1589–1596, 1598–1603, 1607–1608, 1610, 1612, 1614, 1630–1631, 1633, 1635–1637, 1640–1641, 1643, 1645–1647, 1650–1651, 1653, 1655–1658, 1661–1671, 1674–1678, 1688, 1700, 1710, 1714, 1749–1751, 1861, 1885, 1900, 1960, 2003, 2006–2007, 2045, 2052, 2058, 2060–2063, 2066–2076, 2167–2169, 2172–2174, 2186, 2205–2207, 2247–2251, 2263, 2281
TOTAL182228484% 

Tests Skipped Failures Errors Time
16 0 :zzz: 0 :x: 0 :fire: 1.043s :stopwatch:
edwintorok commented 9 months ago

Unfortunately the codecov upload keeps failing with:

X-Reduced-Redundancy: 'false'
[2024-01-15T15:01:14.875Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 503 - upstream connect error or disconnect/reset before headers. reset reason: connection failure
[2024-01-15T15:01:14.875Z] ['verbose'] The error stack is: Error: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 503 - upstream connect error or disconnect/reset before headers. reset reason: connection failure
    at main (/snapshot/repo/dist/src/index.js)

Could we move this codecov upload into a separate step and make the failure non-fatal? It is an external service that apparently is not always working correctly, and shouldn't really block PRs.

bernhardkaindl commented 9 months ago

Thanks for the reviews and approves! I added the missing end of a sentence in a comment found by @psafont!

And, I applied one part of @edwintorok's suggestion to not fail CI if the Codecov upload (the one after pre-commit only) fails, because in pre-commit we have a diff-cover job which fails if diff-coverage is missing (tested from triggering it before in this repo).

The other changes in the comparison stem from the merges of the other PRs in the new master onto which I rebased it now too.