realm / realm-object-server

Tracking of issues related to the Realm Object Server and other general issues not related to the specific SDK's
https://realm.io
293 stars 42 forks source link

What is causing ROS to restart by itself? #153

Closed gemiren closed 7 years ago

gemiren commented 7 years ago

I am seeing the following records in the status command. What is causing the ROS to stop and restart by itself?

Mar 18 04:59:18  systemd[1]: realm-object-server.service: Service hold-off time over, scheduling restart.
Mar 18 04:59:18  systemd[1]: Stopped Realm Object Server.
Mar 18 04:59:18  systemd[1]: Started Realm Object Server.

Thanks.

karagraysen commented 7 years ago

Hi @gemiren! Would you be able to provide us with the following information:

With that information, we'll be able to better assist you in resolving this. Thanks!

gemiren commented 7 years ago

Thanks for looking into this issue. Here are the ROS info:

  1. ROS Developer version: 1.2.1
  2. Server OS & Version: Ubuntu 16.04.2 LTS
  3. Client SDK Version: Realm Java 3.0.0
  4. Client OS & Version: Android API >= 21

The ROS is deployed on the droplet on DigitialOcean as suggested in the deployment document.

Please let me know if you need any additional information from me. Thanks.

karagraysen commented 7 years ago

Okay, awesome. Thanks for sharing this information. I'm going to get this in front of a RMP engineer and they will follow-up with you here.

ianpward commented 7 years ago

@gemiren Can we see an output of the logs and status?

/var/log/realm-object-server.log systemctl status realm-object-server.service journalctl -xe | grep realm

gemiren commented 7 years ago

Just got the ROS restarted issue reproduced about half hour ago. Here are the outputs from the commands. I'll send the log file by email. Thanks.

systemctl status realm-object-server.service
● realm-object-server.service - Realm Object Server
   Loaded: loaded (/etc/systemd/system/realm-object-server.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2017-03-22 05:10:54 UTC; 25min ago
 Main PID: 1777 (realm-object-se)
    Tasks: 14
   Memory: 229.0M
      CPU: 2min 29.795s
   CGroup: /system.slice/realm-object-server.service
           └─1777 /usr/bin/nodejs /usr/bin/realm-object-server -c /etc/realm/configuration.yml

Mar 22 05:10:54 systemd[1]: Started Realm Object Server.
Mar 22 05:10:59 realm-object-server[1777]: 2017-03-22T05:10:59.155Z - info: Logging to file /var/log/realm-object-server.log at level 'trace'.
journalctl -xe | grep realm
-- Subject: Unit realm-object-server.service has finished start-up
-- Unit realm-object-server.service has finished starting up.
Mar 22 03:03:54  realm-object-server[1397]: 2017-03-22T03:03:54.728Z - info: Logging to file /var/log/realm-object-server.log at level 'trace'.
Mar 22 03:10:52  realm-object-server[1397]: uncaught exception in notifier thread: N5realm10LogicErrorE: Bad version number
Mar 22 03:10:52  realm-object-server[1397]: terminate called after throwing an instance of 'realm::LogicError'
Mar 22 03:10:52  realm-object-server[1397]:   what():  Bad version number
Mar 22 03:10:52  systemd[1]: realm-object-server.service: Main process exited, code=killed, status=6/ABRT
Mar 22 03:10:52  systemd[1]: realm-object-server.service: Unit entered failed state.
Mar 22 03:10:52  systemd[1]: realm-object-server.service: Failed with result 'signal'.
Mar 22 03:10:53  systemd[1]: realm-object-server.service: Service hold-off time over, scheduling restart.
-- Subject: Unit realm-object-server.service has finished shutting down
-- Unit realm-object-server.service has finished shutting down.
-- Subject: Unit realm-object-server.service has finished start-up
-- Unit realm-object-server.service has finished starting up.
Mar 22 03:10:59  realm-object-server[1678]: 2017-03-22T03:10:59.374Z - info: Logging to file /var/log/realm-object-server.log at level 'trace'.
Mar 22 05:10:53  realm-object-server[1678]: terminate called after throwing an instance of 'realm::LogicError'
Mar 22 05:10:53  realm-object-server[1678]:   what():  Binary too big
Mar 22 05:10:54  systemd[1]: realm-object-server.service: Main process exited, code=killed, status=6/ABRT
Mar 22 05:10:54  systemd[1]: realm-object-server.service: Unit entered failed state.
Mar 22 05:10:54  systemd[1]: realm-object-server.service: Failed with result 'signal'.
Mar 22 05:10:54  systemd[1]: realm-object-server.service: Service hold-off time over, scheduling restart.
-- Subject: Unit realm-object-server.service has finished shutting down
-- Unit realm-object-server.service has finished shutting down.
-- Subject: Unit realm-object-server.service has finished start-up
-- Unit realm-object-server.service has finished starting up.
Mar 22 05:10:59  realm-object-server[1777]: 2017-03-22T05:10:59.155Z - info: Logging to file /var/log/realm-object-server.log at level 'trace'.
morten-krogh commented 7 years ago

@gemiren

I am sorry about the crashes.

The server crashes twice in this log.

The first crash is due to a problem in the node SDK. We see that bug ourselves in tests. We are currently investigating it.

The second crash is due to a big binary. We have a limit on the size of binaries at around 16MB. If a transaction or single binary exceeds that limit, an exception is thrown. We are working on getting rid of that limit.

gemiren commented 7 years ago

Sounds great. Thanks for the information. Hopefully the crash inside node SDK can be resolved soon. Regarding the 16MB limit, is there way to workaround that for now from the client side? I am using realm Java. Thanks.

morten-krogh commented 7 years ago

The 16MB limit crash happened on the server. There is a similar issue on the client, but I don't think your client code has anything to do with this. We will lift the 16MB limit. We know exactly what it is. We will work on the crash inside the node sdk.

gemiren commented 7 years ago

Looking forward to the fix. Any ETA on next update? Thanks very much.

morten-krogh commented 7 years ago

I don't have a ETA, sorry.

radu-tutueanu commented 7 years ago

The 16MB limit discussed in the second server crash has been lifted server-side in ROS 1.5.0.