sugarlabs / sugar

Sugar GTK shell
GNU General Public License v3.0
264 stars 243 forks source link

Neighbourhood view not updated on shared activity stop #840

Open quozl opened 5 years ago

quozl commented 5 years ago

When an invited user stops a shared activity, the neighbourhood view may be incorrectly updated.

Reproducer;

Observed behaviour;

Expected behaviour;

Instance one shell.log;

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/jarabe/desktop/meshbox.py", line 409, in _buddy_removed_cb
    self._remove_buddy(buddy_model)
  File "/usr/local/lib/python2.7/dist-packages/jarabe/desktop/meshbox.py", line 435, in _remove_buddy
    icon = self._buddies[buddy_model.props.key]
KeyError: dbus.ByteArray('AAAAB...Y85WlA==', variant_level=1)

Instance two shell.log;

1560310543.375053 ERROR root: set_active() failed: org.freedesktop.DBus.Error.NoReply: Message recipient disconnected from message bus without replying

org.freedesktop.DBus.Error.NoReply is caused by premature termination of the telepathy-salut process before it replies to a message.

On instance two at the time of the event, the process ID of the telepathy-salut process changes, and on Ubuntu 19.04 and an apport crash file is added to /var/crash on behalf of the process.

The problem is always reproducible and repeatable.

Theory; a message sent to Telepathy Salut causes it to crash. It should not crash. This is a bug in Telepathy Salut, possibly combined with a bug in Sugar usage of the API.

aperezbios commented 5 years ago

Was this against a released version of Sugar, top-of-trunk, or a branch?

quozl commented 5 years ago

Yes.

aperezbios commented 5 years ago

All of the above?

quozl commented 5 years ago

Sure, that's another way of looking at it. At least released versions 0.112, 0.113, 0.114, master, and python3 branches are affected, as are the branches in development for TelepathyGLib porting.

Aniket21mathur commented 5 years ago

My question might be off-topic. I went through the dbus documentation and classes. We are using dbus.ByteArray class. But as per the documentation it takes only one str argument. variant_level is taken as an argument by dbus.Byte class. I don't know whether my question is relevant, just had it in my mind so posted. :sweat_smile: . Thanks!

quozl commented 5 years ago

_variantlevel is an argument for dbus.Byte, so it is logical by inheritance that it is available for use in a dbus.ByteArray.

% python
Python 2.7.15+ (default, Nov 27 2018, 23:36:35) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import dbus
>>> x = dbus.ByteArray('test')
>>> x
dbus.ByteArray('test')
>>> x = dbus.ByteArray('test', variant_level=0)
>>> x
dbus.ByteArray('test')
>>> x = dbus.ByteArray('test', variant_level=2)
>>> x
dbus.ByteArray('test', variant_level=2)
>>> x = dbus.ByteArray('test', freddy=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'freddy' is an invalid keyword argument for this function
>>> █

dbus-python source package, file dbus_bindings/bytes.c is also useful.

Aniket21mathur commented 5 years ago

Thanks for explaining. :-).

quozl commented 4 years ago

Summary: Debug a Telepathy crash

Sugar uses the Telepathy API for collaboration. Every time a collaboration stops, one of the Telepathy programs crashes with a core dump. It has been happening for a few years.

See if you can use gdb and the Telepathy source code to find out what the problem is! A task for experts. You may have to reach out to the Telepathy project as well.

shaansubbaiah commented 4 years ago

@quozl I'm able to reproduce the error in the instance two example differently.

Reproduced by: Closing any activity. (Tested on the fructose set, this happens on the Log activity as well, which is not a shared activity)

Error:

ERROR root: set_active() failed: org.freedesktop.DBus.Error.NoReply: Message recipient disconnected from message bus without replying

Tested on: Sugar 117-3, Debian 10.4 Buster

quozl commented 4 years ago

Thanks, but it makes no difference. Telepathy Salut is still crashing, and it should not. It is the Telepathy Salut process on instance two that crashes, hence the _setactive message. It is, however, useful to know the same problem persists in 0.117 Debian package 3. Perhaps the same version of Telepathy is used.

srevinsaju commented 4 years ago

I am not sure; I have logs related to telepathy-salut, and I guess @quozl might have seen some other logs of mine related to telepathy_Salut. In case it helps

1591852813.812322 WARNING root: Connection /org/freedesktop/Telepathy/Connection/salut/local_xmpp/e2782320 does not support OLPC buddy properties
/usr/lib/python3.8/site-packages/gi/overrides/Gtk.py:1644: Warning: g_value_type_compatible: assertion 'src_type' failed
  return _Gtk_main(*args, **kwargs)
/usr/lib/python3.8/site-packages/gi/overrides/Gtk.py:1644: Warning: unable to set property 'buddy' of type 'PyObject' from value of type '(null)'
  return _Gtk_main(*args, **kwargs)
1591852839.673118 ERROR root: set_active() failed: org.freedesktop.DBus.Error.NoReply: Message recipient disconnected from message bus without replying

(metacity:4459): metacity-WARNING **: 08:20:43.682: CurrentTime used to choose focus window; focus window may not be correct.
quozl commented 4 years ago

No, it doesn't help, it confuses. "does not support OLPC buddy properties" is a different problem; compare your version of Telepathy Salut, we're using 0.8.1 with Ubuntu 20.04. "NoReply" is the same underlying problem.

aperezbios commented 3 years ago

for future reference, since I just ran across this issue @srevinsaju ran across back in June, inside the SoaS Fedora nightly image:

In the latest Fedora Rawhide (will soon become 34) SoaS images, telepathy-salut is version 0.8.1, and python3-gwebsockets is version 0.7, and metacity is version 3.38.0

srevinsaju commented 3 years ago

For the completion of my answer that time, I was using v0.8.1 using that time.

Tested again, have the same error. v0.8.1.r7, gwessocketson v0.7 and metacity on v3.38.0. Test system was Arch Linux, native install, and not a Debian Live Build.

quozl commented 3 years ago

Indeed. It needs diagnosis by debugging Telepathy, or reviewing Telepathy commits. My guess is the support for Sugar may have been removed from Telepathy because it was not maintained.

aperezbios commented 3 years ago

This still happens under Sugar 0.118 in the Fedora Rawhide nightly composes. Upon examining the Telepathy source repo for Sugar support, it does does not appear to have been removed from current releases of Telepathy, only telepathy-next.

Fedora Rawhide/34 includes telepathy-gabble 0.18.4, telepathy-salut 0.8.1, telepathy-glib 0.24.1, and python3-gwebsockets 0.7

quozl commented 3 years ago

Thanks. That eliminates my guess. It still needs diagnosis by debugging Telepathy, by testing an older version of Telepathy, or by reviewing Telepathy commits between the last working version and now. To be sure you have the same problem as described by this issue, the Telepathy Salut process id will change between when the shared activity is running, and when it is stopped. If process crash dumps are configured, then a core file will be made.

aperezbios commented 3 years ago

This also happens when a shared activity has NOT been stopped, so the bug description should be updated to reflect that, since the scope is significantly larger.

Here's the log output I get from Sugar 0.118 (SoaS Fedora 34 prerelease):

Traceback (most recent call last):
  File "/usr/lib64/python3.9/site-packages/dbus/connection.py", line 606, in msg_reply_handler
    reply_handler(*message.get_args_list(**get_args_opts))
  File "/usr/lib/python3.9/site-packages/jarabe/model/neighborhood.py", line 301, in __conn_get_interfaces_reply_cb
    self._connection[interface] = dbus.Interface(
TypeError: 'NoneType' object does not support item assignment
1613234385.855060 WARNING root: Connection /org/freedesktop/Telepathy/Connection/gabble/jabber/_37b17a0aa5efd8b3801af1778cd1134c45bf8571e_40jabber_2esugarlabs_2eorg_2fsugar does not support OLPC buddy properties
/usr/lib/python3.9/site-packages/gi/overrides/Gtk.py:1649: Warning: g_value_type_compatible: assertion 'src_type' failed
  return _Gtk_main(*args, **kwargs)
/usr/lib/python3.9/site-packages/gi/overrides/Gtk.py:1649: Warning: unable to set property 'buddy' of type 'PyObject' from value of type '(null)'
  return _Gtk_main(*args, **kwargs)
chimosky commented 3 years ago

@aperezbios when a shared activity hasn't been stopped on one of the shared instances?

aperezbios commented 3 years ago

@chimosky when simply looking at the neighborhood view (with jabber.sugarlabs.org configured as the collaboration server)

aperezbios commented 3 years ago

Bizarrely, the simple act of setting G_MESSAGES_DEBUG="all" (and rebooting, or killing jarabe.py to induce sugar desktop re-start) in /etc/environment seems to "fix" the problem, although I've no idea why, as it makes no sense.

chimosky commented 3 years ago

@chimosky when simply looking at the neighborhood view (with jabber.sugarlabs.org configured as the collaboration server)

The first part of the error could be when no interface was returned which could happen when a connection hasn't been made but that'll throw a Disconnected error instead. @quozl might have a better insight as to the possible causes of this.

I've seen the last line of the error and I can't remember the cause of that and I think there used to be an open issue for it.

Bizarrely, the simple act of setting G_MESSAGES_DEBUG="all" (and rebooting, or killing jarabe.py to induce sugar desktop re-start) in /etc/environment seems to "fix" the problem, although I've no idea why, as it makes no sense.

Are you saying when you do this the jabber server connection gets created successfully and those errors aren't shown in the logs anymore?

quozl commented 3 years ago

Oh, okay. No, I'm not using jabber.sugarlabs.org at all, I'm concentrating only on link-local peer to peer, aka Telepathy Salut. If you're using jabber.sugarlabs.org, then you're using Telepathy Gabble. Two very different underlying software components. I'm interested to know what causes Salut to unexpectedly terminate, but I've not even got as far as analysing the core dump.

aperezbios commented 3 years ago

@quozl should we separate these issues, one for telepahthy-gabble, and one for telepathy-salut? We need to figure out why setting G_MESSAGES_DEBUG="all" in /etc/environment (and then restarting jarabe), results in Sugar (using) telepathy-gabble seeing neighbors/peers.

quozl commented 3 years ago

Thanks for asking. I don't mind either way. It's not like were not solving the problem because the issue is too hard to read. Speculating; we're not solving it because (a) it is a Telepathy problem rather than a Sugar problem (most likely), (b) Telepathy is written in C and we have attracted primarily Python developers, and (c) those of us with C skills haven't had the time to investigate.

If you do split the issues, create new ones that condense and summarise the observations in each scenario, with a link back to this issue, then we can close this issue.