thatmattlove / hyperglass

hyperglass is the network looking glass that tries to make the internet better.
https://hyperglass.dev
BSD 3-Clause Clear License
617 stars 93 forks source link

redis cache not working properly #35

Closed astlaurent closed 3 years ago

astlaurent commented 4 years ago

Hi

I am using hyperglass version: 1.0.0-beta.21

there is something not quite right with the redis caching or caching in general. I changed the time out 30 seconds

cache: database: 0 host: localhost port: 6379 show_text: true timeout: 30

I submitted a query and checked several minutes later and it is still responding with the cached value. I checked 30 minutes later and it is still responding with the cached value. It is almost acting like the app has some type of built in cache of some sort outside of redis.

see below for logs

[DEBUG] 20200415 09:50:49 | hyperglass.api.routes:53 | query → Query 4f30d161d3c010883651c5f3a9fc52a32ae26e942e17781628352dc3ea457538 took 7.4446 seconds to run. [DEBUG] 20200415 09:50:49 | hyperglass.api.routes:62 | query → Added cache entry for query: 4f30d161d3c010883651c5f3a9fc52a32ae26e942e17781628352dc3ea457538 [DEBUG] 20200415 09:50:49 | hyperglass.api.routes:67 | query → Cache match for 4f30d161d3c010883651c5f3a9fc52a32ae26e942e17781628352dc3ea457538: [SUCCESS] 20200415 09:50:49 | hyperglass.api.routes:68 | query → Completed query execution for Query(query_location=lab_sea_brdr_01, query_type=bgp_route, qu ery_vrf=default, query_target=69.7.131.0/24)

show current date [root@hyperglass-lab hyperglass]# date Wed Apr 15 10:22:07 CDT 2020

still showing cached entry in browser

cache is gone from redis redis-cli --scan --pattern '*' shows nothing

submit a new lookup and verified it does get put into cache

[DEBUG] 20200415 10:31:10 | hyperglass.api.routes:53 | query → Query c1cbdb4eb1716c36a48f7092d83190a3e34a79b80f4fd863163137a26033839f took 7.5606 seconds to run. [DEBUG] 20200415 10:31:10 | hyperglass.api.routes:62 | query → Added cache entry for query: c1cbdb4eb1716c36a48f7092d83190a3e34a79b80f4fd863163137a26033839f [DEBUG] 20200415 10:31:10 | hyperglass.api.routes:67 | query → Cache match for c1cbdb4eb1716c36a48f7092d83190a3e34a79b80f4fd863163137a26033839f:

Verify it is there redis-cli --scan --pattern '*' c1cbdb4eb1716c36a48f7092d83190a3e34a79b80f4fd863163137a26033839f

confirmed 30 seconds later it disappears redis-cli --scan --pattern '*'

however if i redo that new lookup in the web gui it is still cached, it responds right away and the time stamp on the top of the output is the time of the original query. If i reload the query it refreshed the output properly

astlaurent commented 4 years ago

actually played around with this a bit more and launched a new session in incognito mode in chrome and and i can verify it runs the query again properly so this is some type of browser caching issue. I am not sure if there are any alterations you can do help keep the browser from constantly caching the results.

I also confirmed if i hit the back arrow on the tool and then refresh the page it also works properly

thatmattlove commented 4 years ago

Thanks for the info! I just uploaded version 1.0.0-beta22 which disables the caching of the response in the HTTP library (it was a default I was unaware of). Can you see if that fixed it? FWIW, I was unable to reproduce this issue myself locally, but I'm curious to see if this resolves it for you.

astlaurent commented 4 years ago

it is very strange. i ran sudo pip3 install -U hyperglass then hyperglass --version and it is still stating i am on beta21.

should i be doing something else to update

thatmattlove commented 4 years ago

You’ll need to restart the service as well if you didn’t. When upgrading my LG, I usually do this:

sudo systemctl stop hyperglass
sudo pip3 install -U hyperglass
sudo systemctl restart hyperglass

Let me know if that doesn’t work.

astlaurent commented 4 years ago

Hi

Thank you so much for the help. For some reason the updating is not working properly on my system and i don't know if it is because it is running on Centos or not. See the below log. I stopped the service, ran the update, you can see it claims i had .22 installed, it installed .24 now, then when i run the service you can see it runs .21 and when i do a --version it claims it is .21 as well.

I can see pip drops the files here ./usr/local/lib/python3.6/site-packages/hyperglass-1.0.0b24.dist-info

I am thinking it never actually builds the install after it updates those files.

Installing collected packages: hyperglass Found existing installation: hyperglass 1.0.0b22 Uninstalling hyperglass-1.0.0b22: Successfully uninstalled hyperglass-1.0.0b22 Successfully installed hyperglass-1.0.0b24 [root@hyperglass-lab /]# sudo systemctl restart hyperglass [root@hyperglass-lab /]# service hyperglass status Redirecting to /bin/systemctl status hyperglass.service ● hyperglass.service - hyperglass Loaded: loaded (/etc/hyperglass/hyperglass.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2020-04-16 07:39:36 CDT; 15s ago Main PID: 7784 (hyperglass) CGroup: /system.slice/hyperglass.service ├─7784 /opt/rh/rh-python36/root/usr/bin/python3 /opt/rh/rh-python36/root/usr/bin/hyperglass start ├─7882 /opt/rh/rh-python36/root/usr/bin/python3 /opt/rh/rh-python36/root/usr/bin/hyperglass start ├─7883 /opt/rh/rh-python36/root/usr/bin/python3 /opt/rh/rh-python36/root/usr/bin/hyperglass start ├─7884 /opt/rh/rh-python36/root/usr/bin/python3 /opt/rh/rh-python36/root/usr/bin/hyperglass start └─7885 /opt/rh/rh-python36/root/usr/bin/python3 /opt/rh/rh-python36/root/usr/bin/hyperglass start

Apr 16 07:39:36 hyperglass-lab hyperglass[7784]: [ERROR] 20200416 07:39:36 | hyperglass.configuration.models._utils:164 | validate_image → /etc/hy…s-dark.png Apr 16 07:39:36 hyperglass-lab hyperglass[7784]: [ERROR] 20200416 07:39:36 | hyperglass.configuration.models._utils:164 | validate_image → /etc/hy…ngraph.png Apr 16 07:39:36 hyperglass-lab hyperglass[7784]: [INFO] 20200416 07:39:36 | hyperglass.configuration:37 | → Configuration directory: /etc/hyperglass Apr 16 07:39:37 hyperglass-lab hyperglass[7784]: [ERROR] 20200416 07:39:37 | hyperglass.configuration.models._utils:164 | validate_image → /etc/hy…o-dark.png Apr 16 07:39:37 hyperglass-lab hyperglass[7784]: [ERROR] 20200416 07:39:37 | hyperglass.configuration.models._utils:164 | validate_image → /etc/hy…t-logo.png Apr 16 07:39:37 hyperglass-lab hyperglass[7784]: [ERROR] 20200416 07:39:37 | hyperglass.configuration.models._utils:164 | validate_image → /etc/hy…ngraph.png Apr 16 07:39:37 hyperglass-lab hyperglass[7784]: [INFO] 20200416 07:39:37 | hyperglass.main:89 | on_starting → Python 3.6.9 detected (3.6 required) Apr 16 07:39:37 hyperglass-lab hyperglass[7784]: [INFO] 20200416 07:39:37 | hyperglass.util:499 | build_frontend → Starting UI build... Apr 16 07:39:46 hyperglass-lab hyperglass[7784]: [SUCCESS] 20200416 07:39:46 | hyperglass.util:526 | build_frontend → Completed UI build Apr 16 07:39:46 hyperglass-lab hyperglass[7784]: [SUCCESS] 20200416 07:39:46 | hyperglass.main:99 | on_starting → Started hyperglass 1.0.0-beta.21… 4 workers Hint: Some lines were ellipsized, use -l to show in full. [root@hyperglass-lab /]# hyperglass --version hyperglass version: 1.0.0-beta.21

astlaurent commented 4 years ago

I think i worked through the issue. I believe it had something to do with the original installation attempt going under python and not python3. I am not sure if i accidentally typed pip and not pip3 originally. but anyways. i uninstalled the versions under pip and pip3, removed all the left over files, and reinstalled under pip3. it now starts up. the issue i am seeing now under v .24 is if i try any command or query i get "10.10.10.10 has no containing prefix" (the IP being the local IP of my computer i am browsing from)

[DEBUG] 20200416 10:32:40 | hyperglass.api.models.validators:24 | _member_of → Checking membership of 69.7.132.0/24 for 0.0.0.0/0 [DEBUG] 20200416 10:32:40 | hyperglass.api.models.validators:31 | _member_of → 69.7.132.0/24 is a member of 0.0.0.0/0 [DEBUG] 20200416 10:32:40 | hyperglass.api.models.validators:109 | validate_ip → 69.7.132.0/24 is allowed by access-list AccessList4(network=IPv4Network('0.0.0.0/0'), action='permit', ge=1, le=32) [DEBUG] 20200416 10:32:40 | hyperglass.api.models.validators:149 | validate_ip → Validation passed for 69.7.132.0/24 [DEBUG] 20200416 10:32:40 | hyperglass.util:715 | get_network_info → Attempting to find containing prefix for 10.10.10.10 [ERROR] 20200416 10:32:41 | hyperglass.exceptions:26 | init → [WARNING] 10.10.10.10 has no containing prefix

astlaurent commented 4 years ago

as a heads up i downgraded to V.22 which got rid of the prefix error mentioned above. I am still hitting the cache issue though. I confirmed it happens on chrome and firefox. I attempted edge as well but it just freezes trying to load the site.

thatmattlove commented 4 years ago

Ok - that is now fixed in 1.0.0-beta25. Sorry about that. Gotta stop pushing 2am commits. Can you let me know if the query issue, and the caching issue are fixed now?

IE/Edge being broken I'm aware of - I need to do some research on which specific Babel options are needed to fix this, it's at the top of my to-do list.

About the upgrade issues - yes, I noticed this myself on Ubuntu. It's also on my to-do list to find some way of detecting this and warning the user, or at least documenting it. But I'm glad you got it figured out for the time being.

astlaurent commented 4 years ago

Thanks. The upgrade went fine, the query issue is fixed, the caching issue still seems to happen.

thatmattlove commented 4 years ago

Try as I might, I can’t reproduce this. Just to rule out anything on the back end side vs front end, can you try it out on my Org’s instance and let me know if you see the same behavior?

If you do, can you let me know the browser version and OS you’re using? If you do not, can you let me know the Redis and Python versions on your system?

astlaurent commented 4 years ago

Hi

I am pretty sure your instance does it as well but it is harder since you don't have a time stamp at the top of your bgp query. but the easiest way i found to demonstrate it is I picked Dayton and Honolulu, did a traceroute to something more than a few hops away to make the query drag out a bit. i did 66.203.66.203 which is one of our dns servers in MA, it took about 1 minute to complete on both. I hit back on the app (not browser back button) I waited about 10 minutes I inputted the exact same parameters and did another traceroute and the result comes back in 1 second which tells me it is more than likely cached, it looks like your cache is set to 120 seconds so i don't think it is redis doing this. If i refresh the page using the browser function and do the query again it does the traceroute from the beginning as it should.

I am on windows 10 using chrome Version 80.0.3987.163 (Official Build) (64-bit) but also used firefox 72.0.2 (64-bit) and it did the same thing. I also tried from a windows server 2016 machine with chrome with same results.

thatmattlove commented 4 years ago

Latest 1.0.0-beta29 includes a lot of caching improvements, but I don't think this issue is solved yet. I did a lot more digging, and it appears something at the browser level is caching the output and not clearing it appropriately. It seems to clear properly if the page is refreshed, which leads me to believe the http response, which is stored as state in the <Result/> component, is not being cleared even though the component re-renders. Quite odd, and I'll keep working on it.

astlaurent commented 4 years ago

I just tried .29 and it mostly seems the same in regards to the problem. I did find i needed to pull the cache text field out of the config file to get the service to start but it looks like you handle that dynamically now.

I am not sure if it would help if you forced a browser refresh in the code when you click the app back button.

thatmattlove commented 4 years ago

I know it's been a while, and there have literally been 30+ additional beta releases since this was opened. I've made even more changes to caching, primarily on the backend. There are some ones I want to make on the frontend as well, but I'm curious if you're still seeing this behavior were you to upgrade to the latest release (1.0.0-beta.56). Can you test, and let me know?

Thanks, Matt

thatmattlove commented 3 years ago

If this wasn't resolved before, it should now be in v1.0.0-beta.65, which will be available on PyPI shortly.