jchristgit / nosedrum

a command framework for nostrum
https://hexdocs.pm/nosedrum/Nosedrum.html
ISC License
34 stars 12 forks source link

Potential Race Condition using `:deferred_channel_message_with_source` #27

Closed Bentheburrito closed 1 year ago

Bentheburrito commented 1 year ago

Hi Johannes, I think I'm running into a race condition where my Nostrum.Api.edit_interaction_response/2 call reaches Discord before the initial :deferred_channel_message_with_source response.

The issue

This is my command/1 callback (removed irrelevant logic):

  def command(%Interaction{} = interaction) do
    # ...truncated

    Task.start(fn ->
      do_command(...)
    end)

    [type: :deferred_channel_message_with_source]

do_command/4 makes an API call that could take more than 3 seconds, so I start that in another process and defer a response. However, this API call is snappy 90% of the time, so do_command finishes its work well within the 3 second window for a normal :channel_message_with_source. I think this can lead to a race condition where the Task tries to edit the initial response that isn't there yet, so I end up with a "The application did not respond" message.

Potential Solutions

I could artificially delay do_command with something like Process.sleep(500), but I don't think that's ideal

We could change how Nosedrum handles :deferred_channel_message_with_source and :deferred_update_message response types to ensure the "initial defer -> edit message" process is serial. My first thought is to introduce a new response type: a tuple with a callback:

[type: {:deferred_channel_message_with_source, fn -> do_command(...) end}]

Storage.Dispatcher would run the callback only after Storage.respond/2, which should prevent any race condition. What do you think? Any other approaches you think would be better?

Also, here is my bot's ApplicationCommand if you want to take a closer look

jchristgit commented 1 year ago

Hi Ben,

yep. that does look indeed like a race condition!

We could change how Nosedrum handles :deferred_channel_message_with_source and :deferred_update_message response types to ensure the "initial defer -> edit message" process is serial. My first thought is to introduce a new response type: a tuple with a callback:

[type: {:deferred_channel_message_with_source, fn -> do_command(...) end}]

Storage.Dispatcher would run the callback only after Storage.respond/2, which should prevent any race condition. What do you think? Any other approaches you think would be better?

I think this is the way to go. Should the dispatcher automatically pass the original interaction? Or is that too much magic? In any case, I think it should probably spawn a task to handle that function (because you won't want to block it for something that takes a while to run). Then you won't need to spawn it manually.

Maybe good practice here could be to work the same way as apply/2-3: we either take a function and args or a MFA. And the interaction is explicitly passed in?

What do you think?

Bentheburrito commented 1 year ago

Should the dispatcher automatically pass the original interaction? Or is that too much magic?

Hmmm imo that sounds like an appropriate amount of magic, as long as it's documented.

I think it should probably spawn a task to handle that function (because you won't want to block it for something that takes a while to run). Then you won't need to spawn it manually.

Since this callback would likely be the last operation of Storage.handle_interaction/1,2, do you think we need another process/Task specifically for this callback? In my head, since handle_interaction is already called from the process spawned by nostrum for this interaction, it's okay to block until we're completely done with the interaction. Stating this behavior in the callback doc would hopefully make the user well aware of its blocking nature. What do you think?

Maybe good practice here could be to work the same way as apply/2-3: we either take a function and args or a MFA. And the interaction is explicitly passed in?

Ooo yeah I like that! By "explicitly passed in", do you mean the user must pass the original interaction if they want access to it? E.g. {MyCommand, :command_part2, [interaction, part2_arg]}? I actually think I like this better, since if they don't need the whole interaction struct, it's not automagically appended to the args by Dispatcher. Lmk if I understood you correctly though

jchristgit commented 1 year ago

Since this callback would likely be the last operation of Storage.handle_interaction/1,2, do you think we need another process/Task specifically for this callback? In my head, since handle_interaction is already called from the process spawned by nostrum for this interaction, it's okay to block until we're completely done with the interaction. Stating this behavior in the callback doc would hopefully make the user well aware of its blocking nature. What do you think?

Ahh, you're right, I was still thinking "before" your pull request where you moved it into the client process! Yes, in that case I don't see a need to spawn at all.

Ooo yeah I like that! By "explicitly passed in", do you mean the user must pass the original interaction if they want access to it? E.g. {MyCommand, :command_part2, [interaction, part2_arg]}? I actually think I like this better, since if they don't need the whole interaction struct, it's not automagically appended to the args by Dispatcher. Lmk if I understood you correctly though

Yes, this is what I meant. So if you pass in a plain function of arity 1 - it gets passed the interaction. If you pass {M, F, A} or anything else where you add arguments yourself, you need to pass it explicitly.