zancarius / archlinux-pkgbuilds

Various odds and ends but mostly just custom PKGBUILDs.
8 stars 4 forks source link

Adds [install] section to sentry.service #9

Closed mitchhentges closed 8 years ago

mitchhentges commented 8 years ago

The instructions say that I can just sudo systemd enable sentry, but there's no [install] section. However, there are install sections for each of the "subservices", like sentry -web, -cron, and -celery?

This PR allows systemd enable sentry to work again. I wasn't sure if it should be added to the .service or the .target, but considering how the .target doesn't have its "wants" set (and I think that you need those?), I opted for the .service

zancarius commented 8 years ago

Odd. That should've been in the changeset. I'm wondering if I masked it in my /etc/systemd/system by accident.

I'll merge this in and test. Lemme check first to see what I did.

zancarius commented 8 years ago

Okay, I'm not sure what happened. I migrated one of the machines I run Sentry on to use ZFS and may have removed some prior evidence of my stupid choices during the process. So, it was either a complete omission or (more likely) I'd masked sentry.service and forgot to migrate that change into the version I uploaded to the AUR (like an idiot).

As always, thank you very much for the PR!

mitchhentges commented 8 years ago

No problem, thanks :+1:

mitchhentges commented 8 years ago

Hmm, is it working for you? I'm having a weird timing bug, where if I manually start Sentry after a boot, it works nicely. However, if it boots automatically (systemd enable sentry), then it doesn't seem to work. I can curl localhost:9000 (I get a 400 bad request), but my nginx reverse proxy complains that it's offline

zancarius commented 8 years ago

Give me a minute to setup something to test. I can't reboot any of my machines at this point in time. I don't see any reason why it shouldn't work unless the Requires is incorrect. Maybe there's a race condition of sorts...

zancarius commented 8 years ago

Also, what's the status of systemctl list-units | grep sentry or similar? Anything listed? Running?

mitchhentges commented 8 years ago

Yeah, definitely running, because it's responding to my local curl with a 400, rather than just connection refused

mitchhentges commented 8 years ago

Ahhh, looks like nginx has to start after sentry. If I restart nginx after that initial boot (where Sentry autostarts), it works. So:

  1. Start machine
  2. Nginx, Sentry are booted automatically
  3. Boot completes
  4. Requests to Sentry through my reverse proxy fail
  5. Restart nginx
  6. Requests to Sentry through my reverse proxy succeed

Is there some way to ensure that nginx starts later? I'm not super familiar with systemd

zancarius commented 8 years ago

There is using After, but I'm somewhat reluctant to suggest that in your use case in the event you stop using Sentry.

It sounds as if nginx is caching the failure response (no Sentry) and then remembering that for some length of time. What proxy configs are you using in nginx?

mitchhentges commented 8 years ago

My nginx config for the Sentry site is:

server {
    listen 80;
    server_name sentry.fuzzlesoft.ca;

    location / {
        proxy_pass http://localhost:9000;
        proxy_redirect off;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
    listen 443 ssl;
    ssl_certificate /etc/letsencrypt/live/sentry.fuzzlesoft.ca/fullchain.pem;
    ssl_certificate_key /etc/letsencrypt/live/sentry.fuzzlesoft.ca/privkey.pem;
    include /etc/letsencrypt/options-ssl-nginx.conf;
    ssl_trusted_certificate /etc/letsencrypt/live/sentry.fuzzlesoft.ca/chain.pem;
    ssl_stapling on;
    ssl_stapling_verify on;

    if ($scheme != "https") {
        return 301 https://$host$request_uri;
    }
}
zancarius commented 8 years ago

Odd. I don't see any obvious indications that it should cache the response. I've had other apps running behind nginx that will come up as soon as nginx does.

For the time being as a workaround, you could do the following:

1) Copy /usr/lib/systemd/system/nginx.service to /etc/systemd/system/nginx.service 2) Add After=sentry.service under the [Unit] section.

Also, for privacy reasons, did you want to remove your domains from the sample above or is it critical if you leave them in place?

mitchhentges commented 8 years ago

Nah, it's OK about leaving my URLs, it's in my GitHub profile (and projects) already Thanks for your help! Glad it's not affecting other computers

zancarius commented 8 years ago

Okay, cool. Some people can be funny about that. :)

Optionally, you could also expand the existing After section to read After=network.target sentry.service

If that doesn't work, you might need sentry.target instead.

zancarius commented 8 years ago

Also, another workaround that comes to mind: You may wish to consider changing the fail_timeout and/or max_fails options for nginx. It appears that the default fail_timeout is 10 seconds. So, if Sentry isn't starting up faster than that, nginx marks it as offline.

You could probably also copy /usr/lib/systemd/system/sentry.service to /etc/systemd/system/sentry.service and add Before=nginx.service to the [Unit] section if you'd rather not bugger with your nginx unit.

I'm sure there's a better solution than what I've listed above. Those are the best ones I can think of at this time.

zancarius commented 8 years ago

Oh, and slow_start is also useful if you don't want to tweak fail_timeout (you can use both in your upstream declaration), e.g.:

upstream sentry {
    server sentrybackend.example.com:9000 fail_timeout=15s slow_start=30s;
}

fail_timeout could be set to 0 to disable it. Apparently some people have strong feelings over whether or not that's a good idea.

mitchhentges commented 8 years ago

Hmm, maybe that's not it. If I:

  1. Start nginx (to emulate a bootup where nginx starts first
  2. Attempt to load sentry.fuzzlesoft.ca a bunch of times, so that nginx caches the fact that the source server is "down"
  3. Start Sentry
  4. Immediately load sentry.fuzzlesoft.ca

It works? I don't think that this is an nginx caching problem. Do you have an IRC, and a couple minutes to debug this? It might be easier to do it in a faster-paced environment, then to write the solution on this GitHub issue

Note: Making nginx wait for sentry.service still didn't quite solve the bootup problem, I was still getting a 502 from nginx. I changed After in nginx.service to look like:

After=network.target sentry.service
zancarius commented 8 years ago

I haven't used freenode in years, although I do have a semi-private IRC server on one of my domains: irc.destrealm.org:7000

Some notes:

I can replicate your problem, but only briefly, on a virtual machine I occasionally use for testing. If I try to connect immediately after boot, it'll return a 502 Gateway Unavailable as expected. If I wait for Sentry to settle, it'll respond with a success state (302 Found if unauthenticated and accessing the root).

I'd suggest trying something like the following, changing your nginx config for Sentry to include:

upstream sentry {
    server 127.0.0.1:9000 max_fails=0;
}

server {
    // Same as your existing config with the following change under location:
    location / {
        proxy_pass http://sentry; // This should be the only change here.
    }
}

If max_fails is set to 0 for the upstream, it should technically "never" recall the fail state and thus always retry the proxy endpoint. It might be possible to add this directive to your existing proxy_pass line. I don't know; I've always used separate upstream configurations in the event I need load balancing or fail over, etc., except in rare cases.

It almost has to be an nginx-related configuration at this point if all the Sentry processes are alive. I can't seem to replicate it on my test setup. Is this a busy server, as in do you have a substantial amount of applications that might be trying to contact Sentry for debug logging?

mitchhentges commented 8 years ago

Haha, no, the server is incredibly tame, used by one hardly-popular application. Hmm, I remember a couple versions of Sentry ago when it worked, and I think I know what you're running into. Sentry takes a little longer to start then the systemd service lets on (even when it says it's "running", it still takes a couple seconds to start uWSGI). Even before I had my currently problem, there would be a brief period where nginx would be running, but accessing sentry.fuzzlesoft.ca would "502" until Sentry completely finished starting.

I'll play with the nginx config within the next day and get back to you, I've got an unrelated failing phing deployment now :wink:

zancarius commented 8 years ago

I have no doubt the change to simplify Sentry startup with inter-dependent systemd units is contributing to this; I'm just not sure how. The problem is that Sentry has split into 3 separate applications: Cron, celery (for background tasks), and web (this may be the naughty bit). Rather than proliferating the number of units required for start, it seemed like a good idea at the time to borrow some ideas from NFS and a few related services to cram them all into a single service/target. I'm wondering if this is an especially bright idea at this point... maybe having used supervisord would've been a better bet.

(Aside: I use supervisord for some of my own projects at the expense of adding another dependency; worse, since my projects are Python 3 and supervisord is Python 2-only, it creates a slight headache and maintenance issues I'm still not entirely sure the best way around. I should apologize for using Sentry as an experiment in this regard to see if a systemd-only solution works best, because I really want to avoid extra dependencies if possible.)

I suppose you could try enabling the services individually, e.g. systemctl enable sentry-celery sentry-cron sentry-web (and don't forget to disable sentry first) and report back if this survives a reboot in a better state than trying to launch them all through a single gateway service.

Adding an error_log directive to your server configuration in nginx might be useful, too. We'll probably both need to see that to get an idea what nginx thinks is happening. :)

You may be right about systemd. If the service it runs forks immediately, systemd will return an online state (probably depending on the return code) right away even if the process takes a while to start. I don't know how it handles services that run directly under systemd (in a foreground state); it might report those immediately as started. I'll have to read the manpage lest I start spouting lies, because I honestly don't know what its behavior is, and I'm pretty sure uWSGI for Sentry doesn't fork.

uWSGI is complicated enough that I believe it has some systemd-related code in it, possibly for reporting its state (the docs relate to its "emperor" mode, which I don't use). That would likely require extra configuration of uWSGI through Sentry somehow, and I'm not sure if that would benefit us (maybe?).

Phing... I both love it and hate it. I love that it's basically Ant for PHP and that you can pretty trivially generate .phars for easier deployment (among other things). And hate it for the similar reasons to Ant. Although, I shouldn't complain: While I'm not much of a Java guy, I've dabbled in Maven once or twice. That's a nightmare to the uninitiated (like me) unless you have a fetish for snorting lines of XML off your desk.

mitchhentges commented 8 years ago

Looks like a weird race condition for my server. After enabling some other services (mariadb, php-fpm) on the server, now it's starting fine.

This isn't due to a dependency by Sentry on these services, because if I perform the following steps, Sentry works:

  1. Boot computer
  2. Attempt to load sentry.fuzzlesoft.ca, which will fail
  3. Restart nginx service
  4. Load sentry.fuzzlesoft.ca (succeeds)

Since that works, without starting mariadb or php-fpm, there must be no dependency. Hooray for race conditions ¯(ツ)