unabridged / motion

Reactive frontend UI components for Rails in pure Ruby
https://github.com/unabridged/motion
MIT License
697 stars 19 forks source link

Intermittent exceptions when disconnecting a component and leaking memory #49

Open dannyvanhoof opened 4 years ago

dannyvanhoof commented 4 years ago

We started using ViewComponents in combination with Motion to render orders in our application. We now somehow have the feeling (by diving into it with get_process_mem and rack_mini_profiler) that the memory allocated by these components is not released when the page is closed, so that our Rails application keeps on growing for days, to eventually be thrown out by Heroku.

Question: can this indeed be the case? And if yes: what can we do to somehow trigger cleanup?

alecdotninja commented 4 years ago

That is very strange. Motion does keep a cached copy of the component around in the channel, but that should be released by ActionCable when the subscription is removed by the client (This is done so that the client doesn't have to send the state back up to the server past the initial mounting).

Also, I wouldn't expect this object to be particularly large. A marshalled representation of it is in the client's markup, after all. It is usually on the order of hundreds of bytes.

@dannyvanhoof Why do you think that Motion is leaking memory?

alecdotninja commented 4 years ago

The only other longer-lived piece of state created by Motion (besides the configuration and a few service objects) that I can think of is a renderer for the websocket connection. This should only live as long as the client is connected though -- it is actually stored on the connection object itself.

dannyvanhoof commented 4 years ago

Alec - we're just in the early stage of our investigation. Since we started using ViewComponents + Motion, we saw our memory usage on Heroku rising, and were thrown out couple of times. Early this week I added the gem_process_mem gem to our application, and started looking at the memory usage (production) at the end of loading a page (after starting garbage collection manually), and again after closing the tab on the browser. That's when I noticed that loading a page containing VC + Motion increased the memory used, and it didn't go down after closing the tab. But clearly, this will need more investigation.

dannyvanhoof commented 4 years ago

Update: we checked the size of the marshalled objects on the client side, and for some of our pages we found like 240 VC's, each with 30Kb data. Instead of storing the object as instance variable in the VC, we now store the object's id. This reduced the data on the client side to 240 x 30 bytes. The page itself also loads much faster now, and we see a much smaller increase in memory. I guess I'll stop the investigation as for now. What I learned from this: don't store the object itself as instance variable - unless it's really very small...

alecdotninja commented 4 years ago

I think an intermittent failure that occurred on CI this morning might be related (unfortunately, I retried and that seems to have erased the logs). It was an exception from within ActionCable saying that it could not find a subscription to remove.

When I get some more time, I'll dive into that.

@dannyvanhoof State size is a known pain-point in Motion. We have plans to support server-side state with Redis or Memcached which would significantly cut down on the amount of data over the wire, but that has yet to be implemented.

alecdotninja commented 4 years ago

Ah, it happened again. I won't retry the job this time: https://travis-ci.com/github/unabridged/motion/jobs/383344606

dannyvanhoof commented 4 years ago

@dannyvanhoof State size is a known pain-point in Motion. We have plans to support server-side state with Redis or Memcached which would significantly cut down on the amount of data over the wire, but that has yet to be implemented.

Alec - wouldn't it be an idea to allow for instance variables that could be excluded from being marshalled? this way, we could use the original object to render the first time, but not marshall it, and marshalling the object's id instead. So a first rendering could use the original object (preventing an n+1), while an update would first have to retrieve that object.

alecdotninja commented 4 years ago

@dannyvanhoof If you have some time, I'm curious if 0.4.3 fixes the memory leak that you were noticing.


wouldn't it be an idea to allow for instance variables that could be excluded from being marshaled?

I'm open to this, but I think it is a band-aid for a couple of deeper problems that I want to address. I think the right combination of optimized marshaling implementations for common libraries (like ActiveRecord), server-side state (so there are fewer bytes in the DOM and over the wire), and something like Phoenix LiveView's DOM patching (which would enable the ability to have arbitrarily long lists without storing every item in state) can solve this in a way that the user does not need to worry about which ivars are being serialized.

If something like this is added to Motion, I think it should look like:

def dump_state
  [@ivar, @another_ivar]
end

def load_state(state)
  @ivar, @another_ivar = state
end
dannyvanhoof commented 3 years ago

Alec, yesterday we had again a problem during our rspec tests on wercker, I guess related to what you mentioned above. Version 0.4.3. I'm copying the entire error below. Does this ring a bell? Context: we render a component including motion, fill in an input field in that component, then push a button in that component triggering a motion event. Somehow the entire "system" is not fast enough, unless we put in some sleeps in the rspec test...

E, [2020-10-28T16:31:22.176498 #1392] ERROR -- : Could not execute command from ({"command"=>"message", "identifier"=>"{\"channel\":\"Motion::Channel\",\"version\":\"0.4.0\",\"state\":\"292U0RV/3art6rR/ ... 4ig=="}]: /pipeline/cache/bundle-install/ruby/2.7.0/gems/actioncable-6.0.3.4/lib/action_cable/connection/subscriptions.rb:74:infind' | /pipeline/cache/bundle-install/ruby/2.7.0/gems/actioncable-6.0.3.4/lib/action_cable/connection/subscriptions.rb:55:in perform_action' | /pipeline/cache/bundle-install/ruby/2.7.0/gems/actioncable-6.0.3.4/lib/action_cable/connection/subscriptions.rb:19:inexecute_command' | /pipeline/cache/bundle-install/ruby/2.7.0/gems/actioncable-6.0.3.4/lib/action_cable/connection/base.rb:87:in dispatch_websocket_message' | /pipeline/cache/bundle-install/ruby/2.7.0/gems/actioncable-6.0.3.4/lib/action_cable/server/worker.rb:59:inblock in invoke' `

caifara commented 3 years ago

Tests interacting with motion tend to be unstable because of a race condition. This is an example where we redirected the browser log to Rails logger have a better understanding of events.

D, [2020-10-30T08:21:08.954247 #1424] DEBUG -- : [Browserlog] [Motion]
D, [2020-10-30T08:21:08.954374 #1424] DEBUG -- : [Browserlog] Connecting component
I, [2020-10-30T08:21:09.000870 #1424]  INFO -- : Started GET "/cable" for 127.0.0.1 at 2020-10-30 08:21:09 +0000
I, [2020-10-30T08:21:09.090864 #1424]  INFO -- : Started GET "/cable/" [WebSocket] for 127.0.0.1 at 2020-10-30 08:21:09 +0000
I, [2020-10-30T08:21:09.091058 #1424]  INFO -- : Successfully upgraded to WebSocket (REQUEST_METHOD: GET, HTTP_CONNECTION: Upgrade, HTTP_UPGRADE: websocket)
D, [2020-10-30T08:21:09.124281 #1424] DEBUG -- : [Browserlog] [Motion]
D, [2020-10-30T08:21:09.124420 #1424] DEBUG -- : [Browserlog] Processing motion
D, [2020-10-30T08:21:09.124508 #1424] DEBUG -- : [Browserlog] scan
D, [2020-10-30T08:21:09.124583 #1424] DEBUG -- : [Browserlog] on
I, [2020-10-30T08:21:09.204965 #1424]  INFO -- : [ScanReturnItemsComponent:421200] Connected (in 1.8ms)
D, [2020-10-30T08:21:09.291677 #1424] DEBUG -- : [Browserlog] [Motion]
D, [2020-10-30T08:21:09.291817 #1424] DEBUG -- : [Browserlog] Component connected

It seems a motion is processed even before the component is connected? Which seems weird given the _subscription gets set once the component is connected.

https://github.com/unabridged/motion/blob/90308c4952aa3a6c6833e9d01c4041450884cfde/javascript/Component.js#L23-L31

and

https://github.com/unabridged/motion/blob/90308c4952aa3a6c6833e9d01c4041450884cfde/javascript/Component.js#L39-L45

In this case, the processing of the motion seemed to work, but the rerender never happened failing the test. We also see the Could not execute command error following Unable to find subscription with identifier. The order of the log messages are similar in that case: a motion seems to trigger before the "connected" event.

We see that you use system test helpers to make sure to wait long enough. I guess that originates from the same problem?