Tracking Stability - Githubissues

Noskcaj19 commented 5 years ago

Stability is currently very tricky between managing unsafe code and synchronizing on the main thread.

Being able to statically analyze the code to detect invalid use or missing on_main would help this. It may be possible to implement a compiler plugin to trace sync'd code.

Any stability issues (crashes, hangs, SIGSEGV) should be tracked here.

DarkRTA commented 5 years ago

~~Changing weechat.look.prefix_align while the plugin is loaded causes weechat to hang.~~

This might be related to one of my aliases. Hold on. Edit: For some reason it only happens when I update the setting with an alias

Noskcaj19 commented 5 years ago

Is it causing all of weechat or just the discord related channels to hang? There does seem to be a few issues with how I am managing the threading right now.

DarkRTA commented 5 years ago

I was on weechat 1.9 at the time. I'll try and reproduce on 2.3. But yes it does cause weechat itself to hang.

antoyo commented 5 years ago

I'm currently trying to debug this segfault:

Invalid read of size 8
   at 0x908B534: core::sync::atomic::atomic_load (atomic.rs:2124)
   by 0x88049EF: core::sync::atomic::AtomicUsize::load (atomic.rs:1296)
   by 0x88E2C6E: try_lock_shared_fast (raw_rwlock.rs:527)
   by 0x88E2C6E: <parking_lot::raw_rwlock::RawRwLock as lock_api::rwlock::RawRwLock>::lock_shared (raw_rwlock.rs:94)
   by 0x89E2812: lock_api::rwlock::RwLock<R,T>::read (rwlock.rs:296)
   by 0x849741E: weecord::hook::handle_query::{{closure}} (hook.rs:108)
   by 0x8478B17: std::sys_common::backtrace::__rust_begin_short_backtrace (backtrace.rs:136)
   by 0x842EED5: std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}} (mod.rs:469)
   by 0x8466D08: <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once (panic.rs:309)
   by 0x84C32B1: std::panicking::try::do_call (panicking.rs:293)
   by 0x9A0C7D9: __rust_maybe_catch_panic (lib.rs:85)
   by 0x84C2022: std::panicking::try (panicking.rs:272)
   by 0x84670F8: std::panic::catch_unwind (panic.rs:388)
 Address 0x10 is not stack'd, malloc'd or (recently) free'd

Noskcaj19 commented 5 years ago

Dang, I thought I had gotten all the segfaults figured out. Were you doing anything when the segfault happened? What is your os?

It looks like it was caused by an attempt to retrieve data from serenity, which could be related to the fact that i'm storing objects that aren't supposed to be.

antoyo commented 5 years ago

The first manual load of weechat-discord did work. But when I restarted weechat, I get this segfault on autoload. Everytime I start weechat, I get this segfault.

I use ArchLinux.

Noskcaj19 commented 5 years ago

Thats very interesting, do you have any plugins that could be running the /query command on startup?

antoyo commented 5 years ago

Yes. If I disable the queryman.py plugin, it does not segfault anymore.

Noskcaj19 commented 5 years ago

Alright, so then I'm guessing it's trying to read the data before it's set by discord connecting. I wouldn't expect it to segfault, but I'll take a look at making it wait until it fully loads.

Noskcaj19 commented 5 years ago

Looking at the code, this is almost certainly caused by the query command being run while the global context is still uninitialized memory.

Noskcaj19 commented 5 years ago

Alright, 8bebdbd should fix the segfault, and I think it will also build on stable now.

The only potential issue is that the query command will just return early, so the plugin wont actually open the query channels.

I would also like to ask, would you prefer the current method of adding every channel like the official client, or a more irc like interface where you select individual channels to be added on startup?

antoyo commented 5 years ago

Thanks, the fix works and it now compiles on stable.

I'm not sure which plugin you're talking about, but queryman.py seems to work.

For the channels, I like what the wee-slack plugin is doing (but this behavior seems specific to Slack, not Discord): it only automatically opens channels you've joined and it has a command to list all channels.

Noskcaj19 commented 5 years ago

Does queryman.py work for discord private messages? I would expect it to fail silently with discord PMs.

For channel layout, I am planning to add an Irc like mode in addition to hopefully improving the current layout.

antoyo commented 5 years ago

Also, for the channels, wee-slack notifies when a new channel is created. /query does not seem to work at all with this plugin. So, I cannot test the queryman.py plugin. I get the error saying that "query" must be executed on an irc buffer.

Noskcaj19 commented 5 years ago

So, the query command is implemented, and sometimes works. I think it has issues with users that have spaces in the name, and right now it always gives that error, even when it works. There just isn't a way to report the command succeeded with the FFI right now

antoyo commented 5 years ago

To me, it never seems to work. Is the syntax /query username? Should it be executed in a discord channel? Thanks.

Noskcaj19 commented 5 years ago

Ok, I just checked the code, and it's a bit odd. Right now for some reason it looks through all of the guilds you are in for a user with a matching name, so it works in any weechat buffer. I need to make it search through existing direct messages.

Noskcaj19 commented 5 years ago

Ok, I made the command search existing direct messages. I suppose I should probably have it also search friends and accept qualified user names.

antoyo commented 5 years ago

Can't you just search in the user list of the current buffer?

Noskcaj19 commented 5 years ago

So, that would work fairly well, and I think I'll will prioritize users in the current buffer, but there still needs be to a way to create direct messages to arbitrary users and friends

mqp commented 5 years ago

I have experienced a few hangs when trying to use it -- in each case strace made it clear that it was blocked forever on some lock or another. I can't reproduce them yet.

Probably unrelatedly, looking through the code, I noticed that Discord event handlers besides message which call some buffer-related FFI don't use on_main!. Is that intentional? I assumed there would be trouble if you invoked that stuff from another thread.

Noskcaj19 commented 5 years ago

I too have experienced the occasional hang, can't seem to reproduce them reliably, and unfortunately have yet to find a good way of finding whats causing them. As for that event handler, yes, it definitely needs to be called on main, thanks for finding that. In the ffi rewrite I am hoping to include checks in each function to ensure the api is always used on the main thread, which should point to nearly all threading bugs.

mqp commented 5 years ago

I read on_main! and got a little surprised. I expected it to do something like put the input closure in a queue that was being processed by a kind of weechat master thread, but instead it seems like it just takes a lock and runs the input on whatever thread it's currently on. Did I miss something, or maybe the name is just a holdover from an older implementation?

I'm only a journeyman Rust programmer, so I might not know the best advice, but my first thought for how I would handle this "subset of FFI calls are only usable on the main thread" task in a way that leverages the type checker would be:

Create a struct FFI that represents the FFI interface
Implement FFI stuff as functions on that struct
Instantiate one of those structs on the main thread on plugin initialization

Turn on_main! into something like:

fn enqueue_ffi<F, T>(func: F) -> Future<T> 
where F: Fn(FFI) -> T
{
// put F in a producer/consumer queue for processing
// the main thread is waiting on that queue, dequeueing things, running them
// using its FFI struct, and resolving the future with the result
}

Is that the sort of direction you're headed, or were you thinking of something else?

Noskcaj19 commented 5 years ago

Oh boy. So I have spent a long time working on on_main!. First off, a lot of this code is awful, because I have been haphazardly writing application and ffi code at the same time, and I had no prior experience with weechat. The current idea behind on_main! is just to prevent weechat from doing anything when a thread is doing ffi calls. The macro just blocks the thread until it receives a signal from a weechat timer on the main thread. Once the timer fires, the timer blocks the main thread until the on_main! block exits.

It is far from perfect, but it seems to work ok, as long as the macro isn't used in a function on the main thread (or inside another on_main macro). I like your idea a lot and I think it could work well. It's similar to some of the type based ideas I've had.

I seem to remember at one point I had issues with using closures to synchronize with main, but I can't seem to remember what they were (it might have been something to do with control flow). I don't know if it would still be an issue.

The new weechat api crate I am migrating to has a Weechat struct with all the ffi calls as methods, so it could potentially be used to restrict access to the ffi. This will probably mean some of the code will need to be changed a bit and I am hoping to improve the ffi synchronization along the way somehow.

(sorry this got a little long)

mqp commented 5 years ago

Seems like one way that I can reliably cause it to hang is to start up weechat fresh, connect to a server with a few dozen channels, and quickly hold alt-down ("next channel") to page through channels in the list. It will quickly hang forever. If I press alt-down, wait for the channel history to load, press it again, etc., it gets through the list without hanging.

Noskcaj19 commented 5 years ago

That does seem to cause hangs quite reliably. It seems the problem is in the nicklist loading code. I'll take a look and try and fix it.

Noskcaj19 commented 5 years ago

The problem seems to be related to RwLock locking of Guild objects. Every time it hangs there is a weecord thread waiting for read access, and a serenity thread trying to update the cache waiting for write access. I am not sure what those are blocking on

Edit: Thats wrong.

A weecord thread and a serenity thread attempting a cache update get stuck each with a lock the other needs (the cache and a guild). I have no idea how to fix this issue, however using concurrent data structures as serenity-rs/serenity#584 proposes would solve the issue.

I'm not immediately sure of how to solve this without large changes to serenity it's self.

mqp commented 5 years ago

If Serenity defines a consistent order for taking locks on cache structures (which it should, if there's not some other mechanism like a higher-level lock) then you could avoid such problems by respecting that order.

Noskcaj19 commented 5 years ago

As far as I know I don't believe serenity has either of these, I could add a global cache lock or a way to pause cache update events in my fork, although replacing the RwLock with a lock free map would be ideal.

I believe that careful handling and acquisition of locks should prevent hangs. For right now I think I have fixed it in b477ecc. It probably loads a little slower, but I can't get it to deadlock anymore, see if this helps you.

mqp commented 5 years ago

Just so you know, this upstream bug causes weechat-discord to crash -- you might want to update your Serenity fork to get the fix.

Noskcaj19 commented 5 years ago

Thanks for letting me know, the serenity fork and weechat-discord have been updated.

mqp commented 5 years ago

I left my weechat-discord alone for a weekend while using Discord through other devices, and when I SSHed back in and tabbed over to a channel today (from other non-Discord channels) it hung forever while loading the channel history/userlist. Not sure what specific things might have caused it.

Noskcaj19 commented 5 years ago

Was that code compiled with b477ecc? It could potentially have caused by the discord gateway reconnecting, or some weird threading. I'll see if I can replicate it.

mqp commented 5 years ago

Yep, I was definitely using b477ecc.

Noskcaj19 commented 5 years ago

Were you using irc-mode?

I encountered a deadlock that occurs when a message is received in a watched channel/guild and that has been fixed in 1e6d8c6 but that should only effect irc-mode.

mqp commented 5 years ago

Yep, I was using irc-mode. It didn't happen to me again yet, but I will upgrade to the latest commit for future testing.

mqp commented 5 years ago

With irc-mode and the latest build of weechat (799d79776793b228e1be70d8978dac4870f0d4cd), I find that usually, when terminal-discord loads, weechat hangs indefinitely before it loads any of the channels in my Discord servers. Sometimes, it will load some of the channels, and then hangs. When it hangs, it keeps using CPU, but strace says it's not doing anything except waiting for a lock.

Noskcaj19 commented 5 years ago

That's interesting... are you attempting to load or autojoin a large number of channels? Do you have any more details so I can try and reproduce this?

The only deadlock I have encountered is when loading the history for many channels a deadlock can occur between weechat and serenity.

mqp commented 5 years ago

Not, like, an unreasonable amount. A few dozen at most. I will try to experiment and see if I can figure out anything more about which part of my setup might be causing it.

On Fri, Aug 23, 2019, 19:45 Noskcaj19 notifications@github.com wrote:

That's interesting... are you attempting to load or autojoin a large number of channels? Do you have any more details so I can try and reproduce this?

The only deadlock I have encountered is when loading the history for many channels a deadlock can occur between weechat and serenity.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/terminal-discord/weechat-discord/issues/15?email_source=notifications&email_token=AADIUXOH3SJRKRASXZCZOYLQGCOC3A5CNFSM4GNEY3R2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5BWHEY#issuecomment-524510099, or mute the thread https://github.com/notifications/unsubscribe-auth/AADIUXPPXB33AKLYVRZW243QGCOC3ANCNFSM4GNEY3RQ .

Noskcaj19 commented 5 years ago

Just to clarify, is this occurring before history and the nicklist are loaded (when the buffer is switched to)? Also, a list of enabled plugins might help in replicating the hang.

If it occurs during history loading one possible cause could be the loading of nick, channel, and role mentions which I recently added.

terminal-discord / weechat-discord

Tracking Stability #15