thatmattlove / hyperglass

hyperglass is the network looking glass that tries to make the internet better.
https://hyperglass.dev
BSD 3-Clause Clear License
617 stars 93 forks source link

traceroute failing #274

Open astlaurent opened 1 month ago

astlaurent commented 1 month ago

Deployment Type

Docker

Version

v2.0.4

Steps to Reproduce

I am seeing this with built in XR directive as well as custom directive as well as Juniper. traceroutes to quite a bit of internet destinations fail. debug is showing pattern not detected. This seems to happen if there is a timeout along the path.

Expected Behavior

traceroute to properly display

Observed Behavior

error on display

Configuration

No response

Devices

No response

Logs

hyperglass-1  | [DEBUG] 20240715 20:11:46 |51 | collect → Connecting to device {'device': 'BEL - Bellevue, NE', 'address': 'None:None', 'proxy': None}
hyperglass-1  | [CRITICAL] 20240715 20:11:57 |48 | default_handler → Error {'method': 'POST', 'path': '/api/query', 'detail': "\nPattern not detected: 'RP/0/RSP0/CPU0:DEVICE\\\\#' in output.\n\nThings you might try to fix this:\n1. Explicitly set your pattern using the expect_string argument.\n2. Increase the read_timeout to a larger value.\n\nYou can also look at the Netmiko session_log or debug log for more information.\n\n"}
hyperglass-1  | ERROR - 2024-07-15 20:11:57,971 - litestar - config - Uncaught exception (connection_type=http, path=/api/query):
hyperglass-1  | Traceback (most recent call last):
hyperglass-1  |   File "/usr/local/lib/python3.12/site-packages/litestar/middleware/_internal/exceptions/middleware.py", line 159, in __call__
hyperglass-1  |     await self.app(scope, receive, capture_response_started)
hyperglass-1  |   File "/usr/local/lib/python3.12/site-packages/litestar/routes/http.py", line 80, in handle
hyperglass-1  |     response = await self._get_response_for_request(
hyperglass-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
hyperglass-1  |   File "/usr/local/lib/python3.12/site-packages/litestar/routes/http.py", line 132, in _get_response_for_request
hyperglass-1  |     return await self._call_handler_function(
hyperglass-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
hyperglass-1  |   File "/usr/local/lib/python3.12/site-packages/litestar/routes/http.py", line 152, in _call_handler_function
hyperglass-1  |     response_data, cleanup_group = await self._get_response_data(
hyperglass-1  |                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
hyperglass-1  |   File "/usr/local/lib/python3.12/site-packages/litestar/routes/http.py", line 200, in _get_response_data
hyperglass-1  |     else await route_handler.fn(**parsed_kwargs)
hyperglass-1  |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
hyperglass-1  |   File "/opt/hyperglass/hyperglass/api/routes.py", line 111, in query
hyperglass-1  |     output = await execute(data)
hyperglass-1  |              ^^^^^^^^^^^^^^^^^^^
hyperglass-1  |   File "/opt/hyperglass/hyperglass/execution/main.py", line 67, in execute
hyperglass-1  |     response = await driver.collect()
hyperglass-1  |                ^^^^^^^^^^^^^^^^^^^^^^
hyperglass-1  |   File "/opt/hyperglass/hyperglass/execution/drivers/ssh_netmiko.py", line 92, in collect
hyperglass-1  |     raw = nm_connect_direct.send_command(query, **send_args)
hyperglass-1  |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
hyperglass-1  |   File "/usr/local/lib/python3.12/site-packages/netmiko/utilities.py", line 592, in wrapper_decorator
hyperglass-1  |     return func(self, *args, **kwargs)
hyperglass-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
hyperglass-1  |   File "/usr/local/lib/python3.12/site-packages/netmiko/base_connection.py", line 1721, in send_command
hyperglass-1  |     raise ReadTimeout(msg)
hyperglass-1  | netmiko.exceptions.ReadTimeout: 
hyperglass-1  | Pattern not detected: 'RP/0/RSP0/CPU0:DEVICE\\#' in output.
hyperglass-1  | 
hyperglass-1  | Things you might try to fix this:
hyperglass-1  | 1. Explicitly set your pattern using the expect_string argument.
hyperglass-1  | 2. Increase the read_timeout to a larger value.
hyperglass-1  | 
hyperglass-1  | You can also look at the Netmiko session_log or debug log for more information.
hyperglass-1  | 
hyperglass-1  | 
hyperglass-1  | [INFO] 20240715 20:11:57 |1762 | callHandlers → 172.19.0.1:57876 - "POST /api/query HTTP/1.0" 500 {}
hyperglass-1  | [CRITICAL] 20240715 20:12:15 |34 | __init__ → Request timed out. (Connection timed out) {}
hyperglass-1  | ERROR - 2024-07-15 20:12:15,382 - asyncio - runners - Exception in callback Loop._read_from_self
hyperglass-1  | handle: <Handle Loop._read_from_self>
hyperglass-1  | Traceback (most recent call last):
hyperglass-1  |   File "uvloop/cbhandles.pyx", line 66, in uvloop.loop.Handle._run
hyperglass-1  |   File "uvloop/loop.pyx", line 397, in uvloop.loop.Loop._read_from_self
hyperglass-1  |   File "uvloop/loop.pyx", line 402, in uvloop.loop.Loop._invoke_signals
hyperglass-1  |   File "uvloop/loop.pyx", line 377, in uvloop.loop.Loop._ceval_process_signals
hyperglass-1  |   File "/opt/hyperglass/hyperglass/execution/main.py", line 41, in handler
hyperglass-1  |     raise DeviceTimeout(**exc_args)
hyperglass-1  | hyperglass.exceptions.public.DeviceTimeout: Request timed out. (Connection timed out)
NaumanNahian commented 1 month ago

I'm experiencing the same issue with the same Docker image for Arista devices, not just with Traceroute, but also with other commands that takes a bit longer to finish.

I resolved the issue by increasing the read_timeout by adding send_args['read_timeout'] = 120 in ssh_netmiko.py.

astlaurent commented 1 month ago

send_args['read_timeout'] = 120

Thanks. This worked well. I needed to modify that and the timeout in the config.yaml to the same value. The ssh_netmiko.py file is embedded into the docker image so I had to modify the file and re-commit the docker

@thatmattlove you should make this value configurable or always have it match the timeout value in the config file. The value just simply tells netmiko to wait that max value for the prompt to come back. the default is like 10 seconds which is not long enough for traces

Dinokinni commented 1 month ago

Hey guys I tried this solution but it doesn't work for me. Can you show me where you place this argument? Thanks

astlaurent commented 1 month ago

If it is running on Docker you need to change the file in the docker image not in the app directory on the OS. If you are not running Docker then it is enough to just add the value in the app directory. Here are the instructions I documented to assist

Dinokinni commented 1 month ago

Thank you very much. Now it works perfectly. I wasted all last week working on this.

umiseaz commented 1 month ago

If it is running on Docker you need to change the file in the docker image not in the app directory on the OS. If you are not running Docker then it is enough to just add the value in the app directory. Here are the instructions I documented to assist

* Make sure the service is started

* enter into the docker image shell
  `sudo docker exec -it hyperglass-hyperglass-1 sh`

* Edit netmiko file
  `vi /opt/hyperglass/hyperglass/execution/drivers/ssh_netmiko.py`

* Add the following line on line 56, save and exit the file
  `send_args['read_timeout'] = 120`

* type in exit to leave the docker environment

* Get docker container ID
  `sudo docker ps -a`

* copy container ID for hyperglass-hyperglass

* commit docker changes
  `sudo docker commit <ID> hyperglass-hyperglass`

* restart service

Previously trace ipv6 2600:: and 2a11:: its not working. (for juniper) I follow this guide and tips provided and now is working fine. Thanks