Closed bryanp closed 5 years ago
There is no guarantee that your process would be forked or threaded.
Falcon will pre-load your config.ru
before forking/threading.
https://github.com/socketry/falcon/blob/master/lib/falcon/command/serve.rb#L104
So, everything is loaded into the falcon process by default.
There is no reason why we couldn't add some hooks, but to me this is very app-server specific. I'd prefer if there was a general solution.
Can you show me some example hooks so I can see exactly what kind of things are needed?
One option would be to move load_app
into the container instance. That isn't a stupid idea actually.
Then you could use warmup
in your config.ru which would be executed per-thread or per-process.
warmup do |app|
ActiveRecord::Base.establish_connection(...)
end
It might still make sense to pre-load some dependencies, but as this doesn't affect memory usage, maybe it can be done by adding a file, e.g. config/preload.rb
which is loaded into the pre-fork falcon environment. It wouldn't have any logic, just requires which minimise the memory overheads when forking.
Keep in mind that rolling restarts are going to form an important part of the falcon process model. I'm not sure the best way we achieve this but I suspect we will have some top level falcon process which spawns per-site instances of falcon which can be recycled. These themselves can either be forked or threaded. Anything we do to solve the issue here should be compatible with that design.
Sure. In context of the pakyow web framework, the application server is configured to call either Pakyow.fork
, Pakyow.forking
, or Pakyow.forked
(link):
def fork
forking
yield
forked
end
def forking
call_hooks :before, :fork
@apps.select { |app|
app.respond_to?(:forking)
}.each(&:forking)
end
def forked
call_hooks :after, :fork
@apps.select { |app|
app.respond_to?(:forked)
}.each(&:forked)
booted
end
When using puma, hooks are registered that call the appropriate method (link):
setting :before_worker_fork do
[
lambda { |_| Pakyow.forking }
]
end
setting :after_worker_fork do
[]
end
setting :before_worker_boot do
[
lambda { |_| Pakyow.forked }
]
end
Finally, hooks are registered within the framework that perform certain tasks before/after fork (link):
require "pakyow/support/extension"
module Pakyow
module Environment
module Data
module Forking
extend Support::Extension
apply_extension do
before :fork do
connections_to_reconnect.each(&:disconnect)
end
after :fork do
connections_to_reconnect.each(&:connect)
end
end
class_methods do
def connections_to_reconnect
@data_connections.values.flat_map(&:values).reject { |connection|
connection.name == :memory
}
end
end
end
end
end
end
Does that help? Happy to talk through this more.
Do you think this design makes sense in the context of threads/forks? i.e. forking doesn't exist on TruffleRuby so how should it behave?
If there isn't a process fork I don't think there's anything to do. It's only needed in the case above because the fork breaks the connection in the parent process (more discussion).
If the parent process doesn't have any application state, does it still make sense?
Nope. We implement a process model similar to what you describe above for development. A master process is started that loads dependencies, then forks to spin up the application server. Requests are passed through the master process to the child. When code changes, we replace the child process. In this case there's no need to call forked
because application state doesn't exist in the parent.
I think the model you just described makes the most sense.
Pre-fork code loading shouldn't affect API because it's a leaky abstraction (doesn't make sense for threads, forces all libraries to support some kind of model for fork if they have any kind of connection/state).
In my mind, I think for falcon, it would make the most sense to use bundler to pre-load dependencies by default, i.e. an optional pre-fork could just do bundler/setup
and Bundler.require :prefork
.
This should work for 99% of situations where the point of pre-fork is to minimise memory and startup time (due to loading/parsing dependencies).. The downside with this model is prefork is only really useful in production because you can't just restart by killing the child process (state is captured by pre-fork process that might need to be reloaded).
Yep, makes sense. To handle that last point, when the bundle changes we stop the child process and use exec
to replace the parent process. It works well enough for a development server.
Interesting, so you reload the master process in place?
Yep! It isn't quite as smooth of a user experience, but it works great. When replacing a child, the parent process holds any requests that come in and waits for the child to become available. You don't get that behavior and so requests will fail, but it prevents the user from having to manually restart.
Falcon already does this to some extent - because the parent process binds a socket and shares it across fork/threads.
All we need is an intermediate process to do the pre-fork, and that can manage child processes. I think we can implement this, but the most logical place to do it is in async-container
. It's not falcon
specific - we need to extend async-container
with a new container type, probably called Prefork
which does the behaviour described. The prefork container would make one child process via fork, do bundler magic, and then make another container of children.
Then, in falcon, we'd do something like falcon serve --preforked
rather than --forked
.
That sounds perfect. How can I help?
Sorry, I will make a new issue, rather than move this one.
I wrote a brief outline of what I think would work on https://github.com/socketry/async-container/issues/3
But you seem to be experienced in this, so I leave it up to you to submit a PR.
async-container
is actually fine for now, but at some point I want the ability to add/remove children processes, and handle restarting crashed processes correctly. Right now, it's fixed. That's okay, but it's not very flexible. The preforked container shouldn't make too many assumptions about how that will work going forward, I suspect a lot of it would be refactored.
But if you can show the general concept works, I think it's great and we can merge into current "inflexible" design with an idea to refactor later.
I've been working on how to support this and it should work out of the box using falcon virtual to host applications.
For individual apps, using warmup
makes the most sense. For bundler stuff, it's already loaded by default.
👍 I had forgotten about this, but makes sense. Ended up not needing it for my use case anyway.
@ioquatix @bryanp
warmup
Unfortunately It's invoked after application is build and it is executed in master process that forks containers....
Some gems still have connection issues on fork (see https://github.com/thiagopradi/octopus/issues/489) so I need to add feature for forked container hooks.
What should be the best way to do it?
Currently I did it via custom serve command but I would have less falcon extensions.
#!/usr/bin/env ruby
require 'falcon'
require_relative '../lib/falcon_serve/configuration'
require_relative '../lib/falcon_serve/dsl'
require_relative '../lib/falcon_serve/forked_container'
require_relative '../lib/falcon_serve/command'
FalconServe::Configuration.load 'config/falcon.rb'
begin
FalconServe::Command.call
rescue Interrupt
end
module FalconServe
module Configuration
class << self
def load(path)
@options = {}
DSL.new(@options).instance_eval(File.read(path), path, 1)
end
def [](name)
@options[name]
end
def []=(name, block)
@options[name] = block
end
end
end
end
module FalconServe
class DSL
def initialize(options)
@options = options
end
%i[
before_fork
after_instance_fork
].each do |name|
define_method name do |&block|
@options[name] = block
end
end
end
end
module FalconServe
class ForkedContainer < Async::Container::Forked
def run(*)
FalconServe::Configuration[:before_fork].call
super
end
def async(**options, &block)
spawn(**options) do |instance|
FalconServe::Configuration[:after_instance_fork].call
begin
Async::Reactor.run(instance, &block)
rescue Interrupt
# Graceful exit.
end
end
end
end
end
I wanted to avoid copy of implementation of async
method but it's not possible on current codebase...
module FalconServe
class Command < Falcon::Command::Serve
def container_class
if @options[:container] == :forked
FalconServe::ForkedContainer
else
super
end
end
def call
container = run
container.wait
end
end
end
Async::Container
isn't guaranteed to run in forked process, it may run using threads.
This is why this situation is tricky.
Are you concerned about compatibility with JRuby & TruffleRuby? It cannot use fork, only threads. But each thread should be isolated.
The best solution might be something like --preload
which invokes the current behaviour, and changing the current behaviour to load the app only in the child container. This way, it should work in threads and processes more easily.
What do you think?
Async::Container isn't guaranteed to run in forked process, it may run using threads.
I am ok with it because nobody expects that before_fork
and after_instance_fork
will be executed on thread environment. It's forked container specific options. If we need thread specific stuff we can add smth like after_thread_spawn
. You never want to have the SAME configuration block that should be executed either in fork or thread container (I believe).
Or maybe we can better naming:
Or nested configuration structure
forking do
before {}
on_instance {}
end
threading do
before {}
on_instance {}
end
The best solution might be something like
--preload
How it will work? Does it allow to setup "disconnect connections stuff" on master process and "connect connection stuff" on forked child?
I understand your proposal and it seems reasonable, but first I want to figure out, is this the path we want to take, because after introducing such hooks it will cause a lot of pain to remove it if it turns out to be a bad idea.
What I want to understand is if there is some inescapable semantic model which requires these hooks, or is it just working around some issues in upstream code. Because introducing such a specific interface is a big deal, I need to have a clear vision of how this fits into the big picture.
Regarding how this works, the key point is: why do we load the app before forking?
Well, if your app cannot support going over fork boundaries, because we decided above it's due to a deficiency of the design of the app, then one option is to load the app in the child process.
Can you give me your feedback?
One more point to consider, if the simplest model is just to load the app in the child process/thread, can we still save a lot of memory by doing Bundle.load
or something similar in the parent process?
What I want to understand is if there is some inescapable semantic model which requires these hooks, or is it just working around some issues in upstream code.
Almost all existing ruby servers that are designed with performance (e.g puma, unicorn, rainbows) have fork hooks. I don't know if it was really introduced as workaround for fork issues with application frameworks. If framework/gem currently has that issue it means I can't use falcon until they fix it or I have to patch either framework/gem or falcon.... But what if really application that can be forked requires that some stuff should be configured before and after forking? Why not?
Regarding how this works, the key point is: why do we load the app before forking?
The answer is obvious. We load (preload) app in master process in order to speed up booting process and spare memory. All modern servers do it in production mode at least. Correct?
then one option is to load the app in the child process
It's not efficient or?
can we still save a lot of memory by doing Bundle.load or something similar in the parent process?
It's related to copy-on-write ruby feature. I am not expert in this are. But probably we can measure amount of memory is used in both cases. Production apps are usually fat so it might play big role.
Until we have empirical evidence I am unlikely to just implement it because other server does it. I want to know why it’s required and what the impact is of the various options. So we should measure it if possible.
Because other server has much simple execution model, we need to be very careful about such feature and the burden it creates both on the server implementation and the expectations of how to build client applications. Bearing in mind that rack model is only one supported mode of operation.
Do you have an application we can use to measure?
Do you think such hooks should be added to rack spec?
want to know why it’s required and what the impact is of the various options.
I already provided arguments. Not all apps/frameworks are ready out of box for forking. It maybe they fault but maybe not. Thus other servers have hooks. It means there were a lot cases (and will be in the future i believe) when you should have to tune setup in fork environment. Otherwise server is useless for production case!
If falcon does not plan to support this general feature its bad of course. And I have to extend it in way I suggested before. Which is not good for other people as well because they need to do the same...
And loading app in child process is step back in performance term...
Do you have an application we can use to measure?
Yes I have really big production application. Actually I want to switch from EventMachine based server to Async one. Because EM is outdated and Async has better api which should allow me give more opportunities with parallelism for inbound/outbound connections.
Do you think such hooks should be added to rack spec?
We could add it to config.ru file instead If they don't broke other servers of course. Does rackup specification officially supports custom options?
I know it must be frustrating because you think you have a solution to the problem and feel that existing servers show it to be the right solution, but the reality is this problem is not so simple as adding a hook. So, try to understand my position to move things forward. If I just accept status quo, Async
would not even exist.
You state that loading the app should be faster. But that shouldn't actually matter much, and pre-fork model introduces some complexity too.
For example, rolling restart cannot use such a pre-fork model.
Memory usage may be good once child process is forked, but over time, after GC, etc, it becomes same as if not loading app in parent process. The real saving is shared code.
You say it should be faster, but you don't state how. Well, here are the two scenarios:
To me, the only difference is CPU usage, but the latency is identical.
So, the biggest win might be to preload bundler, and load app in individual thread or process.
It avoids the need for any additional hooks, it's simpler model for user code and overall there is no difference in latency - maybe slight difference in initial load.
Then, there is rolling restarts to consider. It's not possible to do such a thing when app is loaded in parent process unless somehow you restart parent process or use much more complex process for fork e.g. parent -> child (load app) -> baby (serve traffic).
So, please accept that this is non-trivial issue and try to help me with the information I need to make a good decision rather than just stating "I've provided arguments." What I need is data, e.g. latency, load average, graphs, comparisons, etc.
Otherwise, I cannot make a good decision.
I cannot make a good decision.
Then I can make. :) Basically right now I have only one option - use extended forked container. Otherwise I cannot use falcon at all.
Please try preload
branch. Please report startup time, latency, load average, memory usage (both parent process and children processes).
Ensure you add all appropriate gems to :preload
group.
Please compare to current released master.
I can give only numbers for comparison in development environment. Nobody allows me to play with production 😀
I'd like to register events handlers to be called before / after a process fork, so that things like database connections can be disconnected / reconnected properly. In Puma, I would use:
:before_worker_fork
:after_worker_fork
Not seeing a way to do the same thing with Falcon. Do you have any input on how to support?