mperham / inspeqtor

Monitor your application infrastructure!
GNU General Public License v3.0
1.66k stars 72 forks source link

Getting a swap alert when swap is at 0 #150

Open allaire opened 6 years ago

allaire commented 6 years ago

I just upgraded our staging environment to the latest Inspeqtor version (2.0) and swap is always 100% under inspeqtor, when in fact it's actually 0:

screen shot 2018-06-28 at 15 18 07

I suspect this commit could have broke something? https://github.com/mperham/inspeqtor/commit/41680d7528fd6a63bc885d5159fdb56e478d6cc7#diff-c187630e5b94e88c6486c75cccb5a092L25

allaire commented 6 years ago

On the previous version I compiled myself (with the fix of #148), when I run sudo inspeqtorctl status I also see 100%, but no ! or alerts 🤔

mperham commented 6 years ago

I suspect you might be right; it's possible this is a coercion issue. I tried to fix a bunch of lint warnings and might have broke something in doing so. Can you supply a failing test?

allaire commented 6 years ago

@mperham the weird thing is that the previous version I have in production (https://github.com/mperham/inspeqtor/commit/42e9f59246cde2bbbce8eea8d9b82c29f04e6b2d) is also showing swap 100%, but it's not in "alert mode" (!). Two unrelated issues?

FWIW, sudo sysctl -n vm.swapusage in my case returns sysctl: cannot stat /proc/sys/vm/swapusage: No such file or directory.

agendrix@app-01:/proc/sys/vm$ ls
admin_reserve_kbytes         laptop_mode                oom_dump_tasks
block_dump                   legacy_va_layout           oom_kill_allocating_task
compact_memory               lowmem_reserve_ratio       overcommit_kbytes
compact_unevictable_allowed  max_map_count              overcommit_memory
dirty_background_bytes       memory_failure_early_kill  overcommit_ratio
dirty_background_ratio       memory_failure_recovery    page-cluster
dirty_bytes                  min_free_kbytes            panic_on_oom
dirty_expire_centisecs       min_slab_ratio             percpu_pagelist_fraction
dirty_ratio                  min_unmapped_ratio         stat_interval
dirtytime_expire_seconds     mmap_min_addr              swappiness
dirty_writeback_centisecs    nr_hugepages               user_reserve_kbytes
drop_caches                  nr_hugepages_mempolicy     vfs_cache_pressure
extfrag_threshold            nr_overcommit_hugepages    zone_reclaim_mode
hugepages_treat_as_movable   nr_pdflush_threads
hugetlb_shm_group            numa_zonelist_order
allaire commented 6 years ago

Ah, I don't have any swap configured (see first screenshot). Maybe inspeqtor is not handling the case where it can't read the swapusage file correctly and detect it has 100%?

mperham commented 6 years ago

Inspeqtor reads the SwapFree and SwapTotal attributes in /proc/meminfo.

        free := memMetrics["SwapFree"]
        total := memMetrics["SwapTotal"]
        if free == 0 {
            hs.Save("swap", "", 100)
        } else if free == total {
            hs.Save("swap", "", 0)
        } else {
            hs.Save("swap", "", float64(100-int8(100*((free)/(total)))))
        }
mperham commented 6 years ago
mperham commented 6 years ago

Wait, that's backwards. "swap" means "swap in use" and so your rule should trigger. If you don't have swap, you should remove the swap rule.

allaire commented 6 years ago

@mperham Oh well that explains the 100% then. I'm unsure why there's a discrepancy with the previous version about the alerting. I think if swap is disabled (showing 0 kb), inspeqtor should handle it as Swap 0% instead of Swap 100%, no? Something like:

        free := memMetrics["SwapFree"]
        total := memMetrics["SwapTotal"]
        if free == 0 && total != 0 {
            hs.Save("swap", "", 100)
        } else if free == total {
            hs.Save("swap", "", 0)
        } else {
            hs.Save("swap", "", float64(100-int8(100*((free)/(total)))))
        }

Bit off topic, do you recommend enabling Swap on app (puma) and worker (sidekiq) servers?

Thanks!

mperham commented 6 years ago

I see what you are saying. Hmm.

I'd recommend swap on every machine along with an alert if you ever use it. The alternative is the Linux OOM handler killing random processes.

sj26 commented 5 years ago

We use stock Ubuntu images on AWS, and by default these machines have no swap. We use this when installing inspeqtor to disable the swap rule, if it's useful:

# We have no swap
sed -i "s/if swap/#if swap/" /etc/inspeqtor/host.inq