jokob-sk / NetAlertX

🖧🔍 WIFI / LAN intruder detector. Scans for devices connected to your network and alerts you if new and unknown devices are found.
GNU General Public License v3.0
3.12k stars 186 forks source link

[BUG][UNFIMP] Incorrect MAC imported and causing app restarts #848

Closed nathang21 closed 3 weeks ago

nathang21 commented 1 month ago

Is there an existing issue for this?

Current Behavior

App is frequently unresponsive, but the container remains healthy. It appears from the log the backend restarts frequently, from my experience it happens at least a couple times per hour.

11:20:59 The backend restarted (started). If this is unexpected check https://bit.ly/NetAlertX_debug for troubleshooting tips.
11:35:57 The backend restarted (started). If this is unexpected check https://bit.ly/NetAlertX_debug for troubleshooting tips.
11:36:05 The backend restarted (started). If this is unexpected check https://bit.ly/NetAlertX_debug for troubleshooting tips.
11:46:46 The backend restarted (started). If this is unexpected check https://bit.ly/NetAlertX_debug for troubleshooting tips.

One thing to add, this happened earlier on when I was setting up netalertx, and I scrapped the entire config/db because I couldn't figure out the issue, and it was stable for some period of time (I didn't monitor closely). Now that it has reoccured, i'm opening an issue to report since I've failed to debug on my own.

Expected Behavior

The backend to remain stable or at least the logs to clearly indicate why it is unstable.

Steps To Reproduce

This happens continuously, even after restarting/recreating the docker container, perhaps there is an invalid config or a corrupted DB but i'm not sure.

app.conf

app.conf was too big to upload here, [hosted here](https://drive.google.com/file/d/1rq3MMC7DyTNBPrCnOjy5MfXBt0yCaQjN/view?usp=sharing) instead.

docker-compose.yml

services:
  netalertx:
    # image: jokobsk/netalertx-dev:latest 
    image: jokobsk/netalertx:latest
    container_name: netalertx
    healthcheck:
      test: curl -f http://localhost:20211/ || exit 1
      interval: 1m
      timeout: 30s
      retries: 5
      start_period: 300s
    mem_limit: 2g
    shm_size: 2g
    cpu_shares: 768
    network_mode: host
    security_opt:
      - no-new-privileges:true
    restart: always # unless-stopped # no
    stop_grace_period: 2m
    logging:
      driver: json-file
      options:
        max-file: 3
        max-size: 100k
    volumes:
      - /volume2/docker/netalertx/config:/app/config   
      - /volume2/docker/netalertx/db:/app/db   
      - /volume2/docker/netalertx/logs:/app/front/log  
      - /volume2/docker/netalertx/config/resolv.conf:/etc/resolv.conf   # Enable Reverse DNS
    environment:
      - LOG_LEVEL='trace' # info #
      # Fix Permissions by following:
      # https://github.com/jokob-sk/NetAlertX/blob/main/docs/FILE_PERMISSIONS.md
      # https://amoklauf.ch/posts/synology/changeid/
      - PUID=1030 
      - PGID=100
      - HOST_USER_ID=1030
      - HOST_USER_GID=100
      - TZ=America/New_York

What branch are you running?

Production

app.log

app.log was too big to upload here, hosted here instead.

Debug enabled

jokob-sk commented 1 month ago

Hi @nathang21 ,

I see the app restarting when new devices are created.

22:29:34 [Update Devices] - (if not empty) cur_SSID -> (if empty) dev_SSID
22:29:34 [Update Devices] - (if not empty) cur_Type -> (if empty) dev_DeviceType
22:29:34 [Update Devices] - (if not empty) cur_Name -> (if empty) dev_NAME
22:29:37 [MAIN] Setting up ...
22:29:37 [conf.tz] Setting up ...
22:29:37 

22:29:37 The backend restarted (started). If this is unexpected check https://bit.ly/NetAlertX_debug for troubleshooting tips.
22:29:37 

22:29:37 Permissions check (All should be True)
22:29:37 ------------------------------------------------
22:29:37   /config/app.conf |  READ  | True
22:29:37   /config/app.conf |  WRITE | True
22:29:37   /db/app.db       |  READ  | True
22:29:37   /db/app.db       |  WRITE | True
22:29:37 ------------------------------------------------

Can you try to surface the exception which can't be captured in logs by following this guide: https://github.com/jokob-sk/NetAlertX/blob/main/docs/DEBUG_TIPS.md#2-surfacing-errors-when-container-restarts-

Start the container via the terminal with a command similar to this one:

docker run --rm --network=host \
  -v local/path/netalertx/config:/app/config \
  -v local/path/netalertx/db:/app/db \
  -e TZ=Europe/Berlin \
  -e PORT=20211 \
  jokobsk/netalertx:latest
⚠ Please note, don't use the -d parameter so you see the error when the container crashes. Use this error in your issue description.

Or check the docker or Portainer logs? You should be able to see an exception before the container restarts.

Thanks in advance, j

nathang21 commented 1 month ago

Hey @jokob-sk thanks for the quick response, i've tried all of those debugging steps previously, and as I mentioned above is the weird thing is that container remains healthy as far as I can tell, just the backend restarts within the container, but the container continues to run for days without stopping.

Just in case I will start the container without -d again to see if it ever crashes, but in my experience that isn't what happens.

jokob-sk commented 1 month ago

Hi @nathang21 ,

Just FYI the backend restarting and the container restarting are 2 different things. The container doesn't restart (become unhealthy) if the app backend restarts as the reboot is also used when initializing new settings.

The Portainer and Docker logs will still contain the exception. If the restart isn't occurring right now, you can probably replicate it by deleting all devices and waiting for the app to try to re-add the devices. Please backup everything at first and download the devices.csv file (and verify it) as a backup.

So please try to have a look in the e.g. Portainer logs, search for:

The backend restarted (started). If this is unexpected check https://bit.ly/NetAlertX_debug for troubleshooting tips.

...and scroll up a few lines where most likely you'll be able to find a logged exception.

I think the restart might be caused by a device name or other field that contains some un-escaped character.

Thanks in advance, j

EDIT: Backup guide: https://github.com/jokob-sk/NetAlertX/blob/main/docs/BACKUPS.md

nathang21 commented 1 month ago

Thanks, I understand the difference which is why I specified it explicitly. To be very clear, the container is NOT crashing, and there is no exception. The container has been running for many days (until I restarted it earlier today) but the backend restarts a few times per hour.

Container still hasn't crashed after 5 hours (the backend has restarted numerous times), but will let it run over night and see just in case. If not I will move forward with the backup and try to force an occurrence as you suggested.

In the meantime, here is a snippet of the latest logs from my terminal for reference if it's helpful.

netalertx  | 22:08:22 [Plugins] dType: array
netalertx  | 22:08:22 [Plugin utils] Flattening the below array
netalertx  | 22:08:22 ['192.168.1.0/24 --interface=bond0 -vlan=1', '192.168.2.0/24 --interface=bond0 -vlan=2', '192.168.3.0/24 --interface=bond0 -vlan=3']
netalertx  | 22:08:22 [Plugin utils] isinstance(arr, list) : False | isinstance(arr, str) : True
netalertx  | 22:08:22 [Plugins] Resolved value: 192.168.1.0/24 --interface=bond0 -vlan=1,192.168.2.0/24 --interface=bond0 -vlan=2,192.168.3.0/24 --interface=bond0 -vlan=3
netalertx  | 22:08:22 [Plugins] Convert to Base64: True
netalertx  | 22:08:22 [Plugins] base64 value: b'MTkyLjE2OC4xLjAvMjQgLS1pbnRlcmZhY2U9Ym9uZDAgLXZsYW49MSwxOTIuMTY4LjIuMC8yNCAtLWludGVyZmFjZT1ib25kMCAtdmxhbj0yLDE5Mi4xNjguMy4wLzI0IC0taW50ZXJmYWNlPWJvbmQwIC12bGFuPTM='
netalertx  | 22:08:22 [Plugins] Timeout: 300
netalertx  | 22:08:22 [Plugin utils] Pre-Resolved CMD: python3/app/front/plugins/arp_scan/script.pyuserSubnets={subnets}
netalertx  | 22:08:22 [Plugins] Executing: python3 /app/front/plugins/arp_scan/script.py userSubnets={subnets}
netalertx  | 22:08:22 [Plugins] Resolved : ['python3', '/app/front/plugins/arp_scan/script.py', "userSubnets=b'MTkyLjE2OC4xLjAvMjQgLS1pbnRlcmZhY2U9Ym9uZDAgLXZsYW49MSwxOTIuMTY4LjIuMC8yNCAtLWludGVyZmFjZT1ib25kMCAtdmxhbj0yLDE5Mi4xNjguMy4wLzI0IC0taW50ZXJmYWNlPWJvbmQwIC12bGFuPTM='"]
netalertx  | 22:09:02 [Plugins] Processing file "/app/front/plugins/arp_scan/last_result.log"
netalertx  | 22:09:02 [Plugins] SUCCESS, received 50 entries

netalertx  | 22:09:02 [Plugins] sqlParam entries: [(0, 'ARPSCAN', '72:a7:41:fe:30:bf', '192.168.1.1', 'null', '2024-10-16 22:09:01', '192.168.1.1', '(Unknown: locally administered)', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '72:a7:41:fe:30:bf', '192.168.1.2', 'null', '2024-10-16 22:09:01', '192.168.1.2', '(Unknown: locally administered)', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '70:a7:41:fd:38:83', '192.168.1.24', 'null', '2024-10-16 22:09:01', '192.168.1.24', 'Ubiquiti Networks Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '5c:ad:76:7e:71:1f', '192.168.1.26', 'null', '2024-10-16 22:09:01', '192.168.1.26', 'Shenzhen TCL New Technology Co., Ltd', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'b0:c5:ca:38:3c:5f', '192.168.1.28', 'null', '2024-10-16 22:09:01', '192.168.1.28', 'abode systems, inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'e4:38:83:8c:b3:29', '192.168.1.54', 'null', '2024-10-16 22:09:01', '192.168.1.54', 'Ubiquiti Networks Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '8c:ae:4c:e1:70:cf', '192.168.1.72', 'null', '2024-10-16 22:09:01', '192.168.1.72', 'Plugable Technologies', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '50:d2:13:02:63:2b', '192.168.1.48', 'null', '2024-10-16 22:09:01', '192.168.1.48', 'CviLux Corporation', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'c0:95:6d:62:94:3e', '192.168.1.90', 'null', '2024-10-16 22:09:01', '192.168.1.90', 'Apple, Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '8e:65:6f:92:49:9c', '192.168.1.12', 'null', '2024-10-16 22:09:01', '192.168.1.12', '(Unknown: locally administered)', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '7a:64:bd:ac:c1:66', '192.168.1.43', 'null', '2024-10-16 22:09:01', '192.168.1.43', '(Unknown: locally administered)', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '64:33:db:d6:24:b7', '192.168.1.44', 'null', '2024-10-16 22:09:01', '192.168.1.44', 'Texas Instruments', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '38:8b:59:7e:24:e4', '192.168.1.88', 'null', '2024-10-16 22:09:01', '192.168.1.88', 'Google, Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '3c:8d:20:55:8f:aa', '192.168.1.55', 'null', '2024-10-16 22:09:01', '192.168.1.55', 'Google, Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'c8:4b:d6:da:a8:33', '192.168.1.106', 'null', '2024-10-16 22:09:01', '192.168.1.106', 'Dell Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'f4:4e:38:19:27:d0', '192.168.1.40', 'null', '2024-10-16 22:09:01', '192.168.1.40', 'Olibra LLC', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '60:01:94:68:2e:ee', '192.168.1.85', 'null', '2024-10-16 22:09:01', '192.168.1.85', 'Espressif Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '00:1e:06:42:a1:b6', '192.168.1.112', 'null', '2024-10-16 22:09:01', '192.168.1.112', 'WIBRAIN', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '60:22:32:a6:7d:e4', '192.168.1.115', 'null', '2024-10-16 22:09:01', '192.168.1.115', 'Ubiquiti Networks Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'bc:d0:74:58:24:7d', '192.168.1.110', 'null', '2024-10-16 22:09:01', '192.168.1.110', 'Apple, Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '00:f6:20:68:de:7c', '192.168.1.36', 'null', '2024-10-16 22:09:01', '192.168.1.36', 'Google, Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'dc:d3:a2:e2:21:5b', '192.168.1.42', 'null', '2024-10-16 22:09:01', '192.168.1.42', 'Apple, Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '3c:8d:20:3a:2b:cb', '192.168.1.132', 'null', '2024-10-16 22:09:01', '192.168.1.132', 'Google, Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '00:17:88:a7:74:9e', '192.168.1.147', 'null', '2024-10-16 22:09:01', '192.168.1.147', 'Philips Lighting BV', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '3c:31:74:5a:86:c0', '192.168.1.95', 'null', '2024-10-16 22:09:01', '192.168.1.95', 'Google, Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'ac:f1:08:30:78:9f', '192.168.1.89', 'null', '2024-10-16 22:09:01', '192.168.1.89', 'LG Innotek', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '44:09:b8:6f:11:9b', '192.168.1.169', 'null', '2024-10-16 22:09:01', '192.168.1.169', 'Salcomp (Shenzhen) CO., LTD.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'e4:38:83:e6:66:48', '192.168.1.213', 'null', '2024-10-16 22:09:01', '192.168.1.213', 'Ubiquiti Networks Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '3c:8d:20:4b:99:2d', '192.168.1.216', 'null', '2024-10-16 22:09:01', '192.168.1.216', 'Google, Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '84:f3:eb:0b:7d:17', '192.168.1.207', 'null', '2024-10-16 22:09:01', '192.168.1.207', 'Espressif Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'f2:c5:f6:82:8f:67', '192.168.1.130', 'null', '2024-10-16 22:09:01', '192.168.1.130', '(Unknown: locally administered)', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'be:09:dc:6b:8c:17', '192.168.1.171', 'null', '2024-10-16 22:09:01', '192.168.1.171', '(Unknown: locally administered)', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '38:8b:59:7e:24:e4', '192.168.1.255', 'null', '2024-10-16 22:09:01', '192.168.1.255', 'Google, Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '38:b4:d3:96:ae:38', '192.168.1.148', 'null', '2024-10-16 22:09:01', '192.168.1.148', 'BSH Hausgeraete GmbH', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'f0:ef:86:07:de:1f', '192.168.1.244', 'null', '2024-10-16 22:09:01', '192.168.1.244', 'Google, Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'd8:eb:46:b1:20:b1', '192.168.1.97', 'null', '2024-10-16 22:09:01', '192.168.1.97', 'Google, Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'c8:2a:dd:82:4b:53', '192.168.1.29', 'null', '2024-10-16 22:09:01', '192.168.1.29', 'Google, Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '84:f3:eb:0b:1f:a6', '192.168.1.31', 'null', '2024-10-16 22:09:01', '192.168.1.31', 'Espressif Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '84:f3:eb:0b:7b:ca', '192.168.1.162', 'null', '2024-10-16 22:09:01', '192.168.1.162', 'Espressif Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'f8:ff:c2:6a:64:5b', '192.168.1.222', 'null', '2024-10-16 22:09:01', '192.168.1.222', 'Apple, Inc.', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'f8:b9:5a:6f:09:0c', '192.168.1.21', 'null', '2024-10-16 22:09:01', '192.168.1.21', 'LG Innotek', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'f4:30:b9:1f:ae:85', '192.168.1.34', 'null', '2024-10-16 22:09:01', '192.168.1.34', 'Hewlett Packard', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'f8:b9:5a:d4:ef:70', '192.168.1.41', 'null', '2024-10-16 22:09:01', '192.168.1.41', 'LG Innotek', '192.168.1.0/24 --interface=bond0 -vlan=1', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'd4:ad:fc:08:f8:78', '192.168.2.62', 'null', '2024-10-16 22:09:01', '192.168.2.62', 'Shenzhen Intellirocks Tech co.,ltd (802.1Q VLAN=2)', '192.168.2.0/24 --interface=bond0 -vlan=2', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', 'd4:ad:fc:fe:a5:ec', '192.168.2.44', 'null', '2024-10-16 22:09:01', '192.168.2.44', 'Shenzhen Intellirocks Tech co.,ltd (802.1Q VLAN=2)', '192.168.2.0/24 --interface=bond0 -vlan=2', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '04:cf:8c:f9:4e:22', '192.168.2.170', 'null', '2024-10-16 22:09:01', '192.168.2.170', 'XIAOMI Electronics,CO.,LTD (802.1Q VLAN=2)', '192.168.2.0/24 --interface=bond0 -vlan=2', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '7c:a6:b0:10:02:ed', '192.168.2.223', 'null', '2024-10-16 22:09:01', '192.168.2.223', '(Unknown) (802.1Q VLAN=2)', '192.168.2.0/24 --interface=bond0 -vlan=2', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '7c:a6:b0:18:ea:ac', '192.168.2.216', 'null', '2024-10-16 22:09:01', '192.168.2.216', '(Unknown) (802.1Q VLAN=2)', '192.168.2.0/24 --interface=bond0 -vlan=2', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '7c:a6:b0:0f:f0:d3', '192.168.2.27', 'null', '2024-10-16 22:09:01', '192.168.2.27', '(Unknown) (802.1Q VLAN=2)', '192.168.2.0/24 --interface=bond0 -vlan=2', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', ''), (0, 'ARPSCAN', '84:3e:1d:13:ab:e4', '192.168.3.180', 'null', '2024-10-16 22:09:01', '192.168.3.180', '(Unknown) (802.1Q VLAN=3)', '192.168.3.0/24 --interface=bond0 -vlan=3', '', 'not-processed', 'arp-scan', 'null', '', '', '', '', '', '')]
netalertx  | 22:09:02 [Plugins] Processing        : ARPSCAN
netalertx  | 22:09:03 [Plugins] Existing objects from Plugins_Objects: 121
netalertx  | 22:09:03 [Plugins] Logged events from the plugin run    : 50
netalertx  | 22:09:03 [Plugins] pluginEvents      count: 50
netalertx  | 22:09:03 [Plugins] pluginObjects     count: 121
netalertx  | 22:09:03 [Plugins] events_to_insert  count: 0
netalertx  | 22:09:03 [Plugins] history_to_insert count: 121
netalertx  | 22:09:03 [Plugins] objects_to_insert count: 0
netalertx  | 22:09:03 [Plugins] objects_to_update count: 121
netalertx  | 22:09:03 [Plugin utils] In pluginEvents there are 50 events with the status "watched-not-changed"
netalertx  | 22:09:03 [Plugin utils] In pluginObjects there are 50 events with the status "watched-not-changed"
netalertx  | 22:09:03 [Plugin utils] In pluginObjects there are 71 events with the status "missing-in-last-scan"
netalertx  | 22:12:20 [Plugins] Mapping objects to database table: CurrentScan
netalertx  | 22:12:20 [Plugins] SQL query for mapping: INSERT into CurrentScan ( "cur_MAC", "cur_IP", "cur_Vendor", "cur_ScanMethod") VALUES ( ?, ?, ?, ?)
netalertx  | 22:12:20 [Plugins] SQL sqlParams for mapping: [('72:a7:41:fe:30:bf', '192.168.1.1', '(Unknown: locally administered)', 'arp-scan'), ('72:a7:41:fe:30:bf', '192.168.1.2', '(Unknown: locally administered)', 'arp-scan'), ('70:a7:41:fd:38:83', '192.168.1.24', 'Ubiquiti Networks Inc.', 'arp-scan'), ('5c:ad:76:7e:71:1f', '192.168.1.26', 'Shenzhen TCL New Technology Co., Ltd', 'arp-scan'), ('b0:c5:ca:38:3c:5f', '192.168.1.28', 'abode systems, inc.', 'arp-scan'), ('e4:38:83:8c:b3:29', '192.168.1.54', 'Ubiquiti Networks Inc.', 'arp-scan'), ('8c:ae:4c:e1:70:cf', '192.168.1.72', 'Plugable Technologies', 'arp-scan'), ('50:d2:13:02:63:2b', '192.168.1.48', 'CviLux Corporation', 'arp-scan'), ('c0:95:6d:62:94:3e', '192.168.1.90', 'Apple, Inc.', 'arp-scan'), ('8e:65:6f:92:49:9c', '192.168.1.12', '(Unknown: locally administered)', 'arp-scan'), ('7a:64:bd:ac:c1:66', '192.168.1.43', '(Unknown: locally administered)', 'arp-scan'), ('64:33:db:d6:24:b7', '192.168.1.44', 'Texas Instruments', 'arp-scan'), ('38:8b:59:7e:24:e4', '192.168.1.88', 'Google, Inc.', 'arp-scan'), ('3c:8d:20:55:8f:aa', '192.168.1.55', 'Google, Inc.', 'arp-scan'), ('c8:4b:d6:da:a8:33', '192.168.1.106', 'Dell Inc.', 'arp-scan'), ('f4:4e:38:19:27:d0', '192.168.1.40', 'Olibra LLC', 'arp-scan'), ('60:01:94:68:2e:ee', '192.168.1.85', 'Espressif Inc.', 'arp-scan'), ('00:1e:06:42:a1:b6', '192.168.1.112', 'WIBRAIN', 'arp-scan'), ('60:22:32:a6:7d:e4', '192.168.1.115', 'Ubiquiti Networks Inc.', 'arp-scan'), ('bc:d0:74:58:24:7d', '192.168.1.110', 'Apple, Inc.', 'arp-scan'), ('00:f6:20:68:de:7c', '192.168.1.36', 'Google, Inc.', 'arp-scan'), ('dc:d3:a2:e2:21:5b', '192.168.1.42', 'Apple, Inc.', 'arp-scan'), ('3c:8d:20:3a:2b:cb', '192.168.1.132', 'Google, Inc.', 'arp-scan'), ('00:17:88:a7:74:9e', '192.168.1.147', 'Philips Lighting BV', 'arp-scan'), ('3c:31:74:5a:86:c0', '192.168.1.95', 'Google, Inc.', 'arp-scan'), ('ac:f1:08:30:78:9f', '192.168.1.89', 'LG Innotek', 'arp-scan'), ('44:09:b8:6f:11:9b', '192.168.1.169', 'Salcomp (Shenzhen) CO., LTD.', 'arp-scan'), ('e4:38:83:e6:66:48', '192.168.1.213', 'Ubiquiti Networks Inc.', 'arp-scan'), ('3c:8d:20:4b:99:2d', '192.168.1.216', 'Google, Inc.', 'arp-scan'), ('84:f3:eb:0b:7d:17', '192.168.1.207', 'Espressif Inc.', 'arp-scan'), ('f2:c5:f6:82:8f:67', '192.168.1.130', '(Unknown: locally administered)', 'arp-scan'), ('be:09:dc:6b:8c:17', '192.168.1.171', '(Unknown: locally administered)', 'arp-scan'), ('38:8b:59:7e:24:e4', '192.168.1.255', 'Google, Inc.', 'arp-scan'), ('38:b4:d3:96:ae:38', '192.168.1.148', 'BSH Hausgeraete GmbH', 'arp-scan'), ('f0:ef:86:07:de:1f', '192.168.1.244', 'Google, Inc.', 'arp-scan'), ('d8:eb:46:b1:20:b1', '192.168.1.97', 'Google, Inc.', 'arp-scan'), ('c8:2a:dd:82:4b:53', '192.168.1.29', 'Google, Inc.', 'arp-scan'), ('84:f3:eb:0b:1f:a6', '192.168.1.31', 'Espressif Inc.', 'arp-scan'), ('84:f3:eb:0b:7b:ca', '192.168.1.162', 'Espressif Inc.', 'arp-scan'), ('f8:ff:c2:6a:64:5b', '192.168.1.222', 'Apple, Inc.', 'arp-scan'), ('f8:b9:5a:6f:09:0c', '192.168.1.21', 'LG Innotek', 'arp-scan'), ('f4:30:b9:1f:ae:85', '192.168.1.34', 'Hewlett Packard', 'arp-scan'), ('f8:b9:5a:d4:ef:70', '192.168.1.41', 'LG Innotek', 'arp-scan'), ('d4:ad:fc:08:f8:78', '192.168.2.62', 'Shenzhen Intellirocks Tech co.,ltd (802.1Q VLAN=2)', 'arp-scan'), ('d4:ad:fc:fe:a5:ec', '192.168.2.44', 'Shenzhen Intellirocks Tech co.,ltd (802.1Q VLAN=2)', 'arp-scan'), ('04:cf:8c:f9:4e:22', '192.168.2.170', 'XIAOMI Electronics,CO.,LTD (802.1Q VLAN=2)', 'arp-scan'), ('7c:a6:b0:10:02:ed', '192.168.2.223', '(Unknown) (802.1Q VLAN=2)', 'arp-scan'), ('7c:a6:b0:18:ea:ac', '192.168.2.216', '(Unknown) (802.1Q VLAN=2)', 'arp-scan'), ('7c:a6:b0:0f:f0:d3', '192.168.2.27', '(Unknown) (802.1Q VLAN=2)', 'arp-scan'), ('84:3e:1d:13:ab:e4', '192.168.3.180', '(Unknown) (802.1Q VLAN=3)', 'arp-scan')]
netalertx  | 22:13:08 [API] Update API starting
netalertx  | 22:13:09 [API] Updating table_appevents.json file in /front/api
netalertx  | 22:13:09 [API] Updating table_plugins_history.json file in /front/api
netalertx  | 22:13:11 [API] Updating table_plugins_objects.json file in /front/api
netalertx  | 22:13:11 [Scheduler] - Scheduler run for AVAHISCAN: YES
netalertx  | 22:13:11 [Plugin utils] ---------------------------------------------
netalertx  | 22:13:11 [Plugin utils] display_name: AVAHISCAN (Name discovery)
netalertx  | 22:13:11 [Plugins] CMD: python3 /app/front/plugins/avahi_scan/avahi_scan.py
netalertx  | 22:13:11 [Plugins] Resolving param: {'name': 'ips', 'type': 'sql', 'value': 'SELECT dev_LastIP from DEVICES order by dev_MAC', 'timeoutMultiplier': True}
netalertx  | 22:13:11 [Plugin utils] Flattening the below array
netalertx  | 22:13:11 ['0.0.0.0']['0.0.0.0']['192.168.1.147']['192.168.1.112']['1.1.1.1']['192.168.1.196']['192.168.1.36']['172.17.0.2']['172.17.0.3']['172.17.0.4']['192.168.1.143']['192.168.1.125']['192.168.2.170']['192.168.1.98']['192.168.3.165']['192.168.3.120']['192.168.3.240']['192.168.1.229']['192.168.2.179']['192.168.3.238']['192.168.2.211']['192.168.3.156']['0.0.0.0']['192.168.3.25']['192.168.1.183']['192.168.2.231']['0.0.0.0']['192.168.3.215']['192.168.1.88']['192.168.1.148']['0.0.0.0']['192.168.1.230']['192.168.1.95']['192.168.1.132']['192.168.1.216']['192.168.1.55']['192.168.1.169']['192.168.1.17']['192.168.3.130']['192.168.1.203']['192.168.1.102']['192.168.1.48']['192.168.2.21']['192.168.2.208']['192.168.1.26']['192.168.2.232']['192.168.1.85']['192.168.1.115']['192.168.2.12']['192.168.2.186']['192.168.2.165']['192.168.2.39']['192.168.2.217']['192.168.2.160']['192.168.1.44']['192.168.1.18']['192.168.2.106']['192.168.1.45']['192.168.3.226']['192.168.1.163']['192.168.1.24']['192.168.1.1']['192.168.3.15']['192.168.1.1']['192.168.3.228']['192.168.1.43']['192.168.1.57']['192.168.2.27']['192.168.2.223']['192.168.2.216']['192.168.3.209']['192.168.3.180']['192.168.1.31']['192.168.1.162']['192.168.1.207']['192.168.3.59']['192.168.1.72']['192.168.3.228']['192.168.1.49']['192.168.1.12']['192.168.1.13']['0.0.0.0']['192.168.3.203']['192.168.3.136']['192.168.3.34']['192.168.1.165']['192.168.1.230']['192.168.3.253']['0.0.0.0']['192.168.3.143']['192.168.1.243']['192.168.3.224']['192.168.1.89']['0.0.0.0']['0.0.0.0']['192.168.1.240']['192.168.1.77']['192.168.1.28']['192.168.1.129']['192.168.1.161']['192.168.1.154']['192.168.3.175']['192.168.3.245']['192.168.1.199']['192.168.1.110']['192.168.1.171']['192.168.1.186']['192.168.1.82']['192.168.1.90']['192.168.1.189']['192.168.3.182']['192.168.2.57']['192.168.1.29']['192.168.1.106']['192.168.1.203']['192.168.0.254']['192.168.2.62']['192.168.2.69']['192.168.2.83']['192.168.2.52']['192.168.2.181']['192.168.2.183']['192.168.2.48']['192.168.2.97']['192.168.2.44']['192.168.1.97']['192.168.3.42']['192.168.1.42']['192.168.3.146']['192.168.1.54']['192.168.1.213']['192.168.2.32']['192.168.3.117']['192.168.1.243']['192.168.1.164']['192.168.1.244']['0.0.0.0']['192.168.1.130']['192.168.1.34']['192.168.1.40']['192.168.3.144']['192.168.1.21']['192.168.1.41']['192.168.1.222']['192.168.1.180']['192.168.3.136']['192.168.2.35']['192.168.3.38']['99.47.164.171']
netalertx  | 22:13:11 [Plugin utils] isinstance(arr, list) : True | isinstance(arr, str) : False
jokob-sk commented 1 month ago

Hi @nathang21 ,

Thanks for the info. Well it's good it's working at least.

Looking at the logs I can't see anything wrong. Let's see if the issue reappears of you can reproduce it later on.

Keep me posted, j

nathang21 commented 1 month ago

Good morning, here is an updated log output from my terminal, there is a Traceback right before the backend restarted in it's most recent occurrence. Is this helpful by chance?

Other details/symptoms that may be relevant:

Do let me know if you need more details, full logs, or additional troubleshooting steps. Thanks again.

output.log

jokob-sk commented 1 month ago

Thanks @nathang21 , this helps a lot.

In order to determine the root cause of the issue, can you please send me the logs for the DEVICES and CurrentScan table just before this issue occurs? It seems like a plugin is passing an invalid MAC address to the core app.

  1. Please set LOG_LEVEL to trace (Disable it once you have the info as this produces big log files) (you can do that in the app.conf file directly if having issues in the front end)
  2. Wait for the issue to occur.
  3. Search for ================ DEVICES table content ================ in your logs
  4. Search for ================ CurrentScan table content ================ in your logs
  5. Share the logs below those lines in the log file. Feel free to sent them to netalertx@gmail.com if sensitive info present.
  6. Please set LOG_LEVEL to debug or lower.

Thanks in advance, j

jokob-sk commented 1 month ago

I also added a bit of additional logging so if you can switch to the netalertx-dev image. The log entry should print the input parameters for the guess_icon method. You can search for the following before the error is thrown.

[guess_icon] Guessing icon for (vendor|mac|ip|name)

Still, I will need the above printed output of the database tables to properly fix the problem.

Thanks in advance, J

nathang21 commented 1 month ago

Thanks for the instructions and patience, I just sent you over an email with the logs. The backend restarted within a few minutes after a fresh creation + boot of the container so I just sent over the full app.log, but if that's too much I can trim it.

Best, Nathan

jokob-sk commented 1 month ago

Hi @nathang21 ,

Thanks a lot for the help! This is exactly what I needed. UNFIMP seems to import invalid MAC addresses for some devices. I implemented a check and only valid MAC addresses are stored and passed to the app now.

This should be fixed in the next release. It would be great if you could test this. Can you please switch to the netalertx-dev docker image (backup everything at first), in about 15 minutes (or after the last action finishes) from now.

Thanks in advance, j

nathang21 commented 1 month ago

Glad to hear it, I think there may be multiple sources of crashes unfortunately, as I only recently got the UNFIMP plugin setup, but I've been having this issue for longer.

However, the latest version does appear more stable, it lasted for about 30 minutes before crashing this time, i've emailed over an updated app.log and app.conf right after the crash, but i'm not seeing an obvious stack trace this time.

Best regards, Nathan

jokob-sk commented 1 month ago

Hi @nathang21 ,

thanks for the logs again! I checked them and the issue seems the same. Are you sure the new image is used? I checked the log and a new debug output that should be logged

[is_mac] not a MAC:

https://github.com/jokob-sk/NetAlertX/blob/05e4de0dc80f86c8ccc99a687a137875429b5d7e/front/plugins/plugin_helper.py#L88

... isn't found in the log file. Could you please double check the latest netalertx-dev image is used? You can use the --pull always parameter:

docker pull --pull always <image-name>:<tag>

Thanks in advance!

nathang21 commented 1 month ago

Shoot sorry about that, I added an explicit pull_policy to my compose to prevent this from happening again.

Got another occurrence and shared logs with you directly.

Here is the stack trace:

netalertx  | Traceback (most recent call last):
netalertx  |   File "<frozen runpy>", line 198, in _run_module_as_main
netalertx  |   File "<frozen runpy>", line 88, in _run_code
netalertx  |   File "/app/server/__main__.py", line 209, in <module>
netalertx  |     sys.exit(main())
netalertx  |              ^^^^^^
netalertx  |   File "/app/server/__main__.py", line 139, in main
netalertx  |     process_scan(db)
netalertx  |   File "/app/server/networkscan.py", line 45, in process_scan
netalertx  |     update_devices_data_from_scan (db)
netalertx  |   File "/app/server/device.py", line 462, in update_devices_data_from_scan
netalertx  |     dev_Icon = guess_icon(device['dev_Vendor'], device['dev_MAC'], device['dev_LastIP'], device['dev_Name'], default_icon)
netalertx  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
netalertx  |   File "/app/server/device.py", line 684, in guess_icon
netalertx  |     mac    = mac.upper()
netalertx  |              ^^^^^^^^^
netalertx  | AttributeError: 'int' object has no attribute 'upper'
jokob-sk commented 1 month ago

Hi @nathang21 ,

Are you sure you pulled the newest netalertx-dev image? I still can't see the following line anywhere in the debug log:

[is_mac] not a MAC

I also deployed this #856 code to the dev image - can you verify you see this when you select to display the Last IP in the device list? If you don't please try to pull the dev image again until you see the Last IP as a link - this will then verify you are on the latest dev image.

jokob-sk commented 1 month ago

Also make sure you are pulling the netalertx-dev image, not netalertx.

nathang21 commented 1 month ago

Hmm maybe I attached the wrong logs 🤦, shows I already have the latest image?

$ docker pull jokobsk/netalertx-dev:latest
latest: Pulling from jokobsk/netalertx-dev
Digest: sha256:95955e81e9b396d41fcfbc96f781885046edc957f87e1df3e5c948e1e99b8276
Status: Image is up to date for jokobsk/netalertx-dev:latest
docker.io/jokobsk/netalertx-dev:latest

Before sending, I saw I made sure to look for the  [is_mac] not a MAC messages, and I just confirmed again. Are you looking at the logs from my latest email? I just resent just in case. I'm also waiting for the issue to re-occur again and will send more logs once it happens.


I tried to confirm the LastIP feature that you mentioned, but i'm unable to get any of the devices to list out in the table - which is a separate issue I've noticed this for a couple weeks but figured it could be a side effect of the app crashing or maybe a corrupted config/DB. Hopefully this screenshot which shows the build date and version number helps confirm i'm running the latest version.

image

I suspect this is unrelated, but I did see an MQTT error which may be a side effect of the invalid MACs?

netalertx  | 09:44:54 [MQTT] Sending MQTT message: {"last_ip": "192.168.3.130", "is_new": "1", "vendor": "Unknown locally administered 8021Q VLAN3", "mac_address": "46:5b:a5:e7:f9:48", "model": "iPhone", "last_connection": "2024-10-20T12:20:45-04:56", "first_connection": "2024-10-14T19:38:43-04:00"}
netalertx  | Traceback (most recent call last):
netalertx  |   File "/app/front/plugins/_publisher_mqtt/mqtt.py", line 541, in <module>
netalertx  |     sys.exit(main())
netalertx  |              ^^^^^^
netalertx  |   File "/app/front/plugins/_publisher_mqtt/mqtt.py", line 67, in main
netalertx  |     mqtt_start(db)
netalertx  |   File "/app/front/plugins/_publisher_mqtt/mqtt.py", line 451, in mqtt_start
netalertx  |     deviceId        = 'mac_' + device["dev_MAC"].replace(" ", "").replace(":", "_").lower()
netalertx  |                                ^^^^^^^^^^^^^^^^^^^^^^^^^
netalertx  | AttributeError: 'int' object has no attribute 'replace'

Edit: Crashed after about 40 minutes, send logs to you app.3.log.

jokob-sk commented 1 month ago

Hi @nathang21 ,

Thanks a lot. The fix should prevent the UNIFIMP plugin to pass in invalid MAC addresses, which by the looks of it it's already doing:

16:43:46 [is_mac] not a MAC: 3549867576
16:43:46 [UNFIMP] Skipping, not a valid MAC address: 3549867576

What I think is now happening is, the system has now already ingested an invalid MAC address so we need to remove invalid devices before the fix takes effect.

Can you delete the devices or setup a new instance to test this?

You can try any of the following depending on if you want to preserve existing data:

  1. Deleting a SELECTION of the devices from under Maintenance -> DB Tools
  2. Deleting the devices one-by-one by suing the value that has thrown the error and using it in the URL http://<hostIP>:20211/deviceDetails.php?mac=<invalid mac> so e.g. http://<hostIP>:20211/deviceDetails.php?mac=84179990 (not sure if the page will load though.)
  3. Deleting ALL the devices from under Maintenance -> DB Tools
  4. Setting up a new fresh instance

What I expect to see after a clean up of the DB in the logs:

[UNFIMP] Skipping, not a valid MAC address: 84179990

The value 84179990 is the first incorrect MAC logged by the plugin so the new code should filter it out. 🤞

Thanks for the help and patience, j

nathang21 commented 4 weeks ago

Thanks for the explanation, that makes sense. I ended up going with #3 and good news, I think it's been stable for at least 24 hours now. 🤞


Following up on 2 potentially unrelated issues (happy to open separate ones for those if preferred).

1) As mentioned above, on the /devices page it shows the # of devices, but the table still does not load at all, so that must be from some other cause/corruption? Any ideas on how to debug that?

2) After deleting all the devices, as expected the new device detections have started to trickle in. One thing i've noticied is that almost always when the notifications fire, they show "name not found" which make them not very useful. However I have multiple name resolution plugins enabled, and typically after the scans run most devices have a name. I have all of the relevant scans scheduled to run on the same internal as recommended, is there a way to wait for the scans/resolutions to finish before the publishers fire? From reading the docs I thought that is how it's supposed to work already, but in practice that is not my experience.

Version: Built on: 2024-10-24 | Version: 08:01:49 - Dev I've emailed over my latest app.conf and app.log for reference

jokob-sk commented 4 weeks ago

Hi @nathang21 ,

Glad to hear that! 🎉

It really is easier to open separate issues so I can then track them for the release announcement and I have relevant logs.

1

For the 1. issue, can you please clear both caches (in-browser by clicking shift + refresh on the tab) and the app one (the blue 🔄 button in the app header). If the issue persists, please open a new issue with the browser console log (F12). I also might need the output of relevant plugins (Search for ================ DEVICES table content ================ in your logs) just to verify the data in the Devices table is correct. Check my above comment how to get this output. Please open a new issue if this persists 🙏

2

I think that's currently by design but I would have to look into that. Let's fix the above issue first and see if the behavior persists (might be related) and if yes, you can open a new issue and I can look into this as well. This might be a more substantial pice of work as there are back end dependencies I have to look at. Again, easier tracked in a separate issue.

Also, just FYI the latest logs you sent to the email seem to be empty (file size 0).

Thanks for the patience, j

nathang21 commented 3 weeks ago

Thanks, and sorry about the empty logs, forgot to fix file permissions before downloading from my NAS.

All in all it's been stable for awhile now so I think we can close this issue.

I'll open separate issues for the other 2 when I get a chance. Thanks again.

jokob-sk commented 3 weeks ago

Thanks for the update @nathang21