netenglabs / suzieq

Using network observability to operate and design healthier networks
https://www.stardustsystems.net/
Apache License 2.0
792 stars 106 forks source link

sq-poller crash #103

Closed jopietsch closed 4 years ago

jopietsch commented 4 years ago

I don't know if I can reproduce this yet and I don't know how transient it is I was updating basic_dual_bgp which is dual-attach bgp and got this error in from my script

Traceback (most recent call last):
  File "/tmp/pycharm_project_304/suzieq/suzieq/poller/sq-poller", line 193, in <module>
    asyncio.run(start_poller(userargs, cfg))
  File "/usr/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/tmp/pycharm_project_304/suzieq/suzieq/poller/sq-poller", line 133, in start_poller
    await asyncio.gather(*tasks)
  File "/tmp/pycharm_project_304/suzieq/suzieq/poller/writer.py", line 65, in run_output_worker
    worker.write_data(data)
  File "/tmp/pycharm_project_304/suzieq/suzieq/poller/writer.py", line 48, in write_data
    preserve_index=False)
  File "pyarrow/table.pxi", line 1177, in pyarrow.lib.Table.from_pandas
  File "/home/jpiet/.local/share/virtualenvs/suzieq-zI29E9ll/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 575, in dataframe_to_arrays
    for c, f in zip(columns_to_convert, convert_fields)]
  File "/home/jpiet/.local/share/virtualenvs/suzieq-zI29E9ll/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 575, in <listcomp>
    for c, f in zip(columns_to_convert, convert_fields)]
  File "/home/jpiet/.local/share/virtualenvs/suzieq-zI29E9ll/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 566, in convert_column
    raise e
  File "/home/jpiet/.local/share/virtualenvs/suzieq-zI29E9ll/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 560, in convert_column
    result = pa.array(col, type=type_, from_pandas=True, safe=safe)
  File "pyarrow/array.pxi", line 265, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 80, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: ('Could not convert false with type str: tried to convert to boolean', 'Conversion failed for column backupActive with type object')
Traceback (most recent call last):
  File "/home/jpiet/.local/share/virtualenvs/suzieq-zI29E9ll/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2646, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'timestamp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jpiet/.local/share/virtualenvs/suzieq-zI29E9ll/lib/python3.7/site-packages/nubia/internal/cmdbase.py", line 448, in run_cli
    return fn(**kwargs)
  File "/tmp/pycharm_project_304/suzieq/suzieq/cli/sqcmds/TableCmd.py", line 41, in show
    df = self.sqobj.get(hostname=self.hostname, namespace=self.namespace)
  File "/tmp/pycharm_project_304/suzieq/suzieq/sqobjects/tables.py", line 31, in get
    info.update(table_obj.get_table_info(table, **kwargs))
  File "/tmp/pycharm_project_304/suzieq/suzieq/engines/pandas/engineobj.py", line 84, in get_table_info    times = all_time_df['timestamp'].unique()
  File "/home/jpiet/.local/share/virtualenvs/suzieq-zI29E9ll/lib/python3.7/site-packages/pandas/core/frame.py", line 2800, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/jpiet/.local/share/virtualenvs/suzieq-zI29E9ll/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2648, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1619, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1627, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'timestamp'
Error running command: 'timestamp'
------------------------------------------------------------
jopietsch commented 4 years ago

I think that might be two exceptions squished together. My script calls sq-poller, than kills it, and then runs a table show. so I think there is a problem from both the poller and the cli

jopietsch commented 4 years ago

the problem is with mlag, backupActive

pyarrow.lib.ArrowInvalid: ('Could not convert false with type str: tried to convert to boolean', 'Conversion failed for column backupActive with type object')

jopietsch commented 4 years ago

I got it to work again by making these changes

PS C:\Users\jpiet\code\suzieq> git diff .\config\mlag.yml .\config\schema\mlag.avsc
diff --git a/config/mlag.yml b/config/mlag.yml
index e8cdec8c6..3c40bbacd 100644
--- a/config/mlag.yml
+++ b/config/mlag.yml
@@ -21,7 +21,7 @@ apply:
     "status/peerId: peerMacAddress",
     "status/peerIf: peerLink",
     "peerLinkStatus: peerLinkStatus?|NA",
-    "status/backupActive: backupActive?|false",
+    "status/backupActive: backupActive?True=true|false",
     "status/backupIp: backupIP",
     "status/backupReason: backupReason?|",
     "status/linklocal: usesLinkLocal?|False",
diff --git a/config/schema/mlag.avsc b/config/schema/mlag.avsc
index 74a796d70..d35d4ba57 100644
--- a/config/schema/mlag.avsc
+++ b/config/schema/mlag.avsc
@@ -47,7 +47,7 @@
         },
         {
             "name": "backupActive",
-            "type": "boolean"
+            "type": "string"^M
         },
         {
             "name": "mlagSinglePortsCnt",
ddutt commented 4 years ago

That change is wrong. Why didn't I run into any issue yesterday? Let me see