mocktools / go-smtp-mock

SMTP mock server written on Golang. Mimic any 📤 SMTP server behavior for your test environment with fake SMTP server.
MIT License
123 stars 17 forks source link

[BUG] Data race between `newMessage()` and `Messages()` #150

Closed frwl404 closed 6 months ago

frwl404 commented 1 year ago

New bug checklist

Bug description

Messages() call protected with:

    server.Lock()
    defer server.Unlock()

but newMessage() doesn't. It looks like because of this we can have following race condition:

WARNING: DATA RACE
Read at 0x00c00024a950 by goroutine 9:
  github.com/mocktools/go-smtp-mock/v2.(*Server).Messages()
      /home/anton/go/pkg/mod/github.com/mocktools/go-smtp-mock/v2@v2.0.4/server.go:128 +0x1ab
...
Previous write at 0x00c00024a950 by goroutine 25:
  github.com/mocktools/go-smtp-mock/v2.(*messages).append()
      /home/anton/go/pkg/mod/github.com/mocktools/go-smtp-mock/v2@v2.0.4/message.go:154 +0x11b
  github.com/mocktools/go-smtp-mock/v2.(*Server).newMessage()
      /home/anton/go/pkg/mod/github.com/mocktools/go-smtp-mock/v2@v2.0.4/server.go:181 +0xfd
...

Please kindly let me know if direction of my thoughts is right, or there is some other reason for this race? Thanks in advance!

Complete output when running smtpmock, including the stack trace and command used
[INSERT OUTPUT HERE]
bestwebua commented 1 year ago

@achivkun Could you provide some code examples to get this race condition? I think everything is fine with Server#newServer. What do you think, @benjamin-rood?

frwl404 commented 1 year ago

@bestwebua, @benjamin-rood, here is example of test, which will fail with -race option (note: sending of e-mail performed from separate go routine, it is important for us, because we use it in logrus hook to send notifications in case of critical failures and to not block on every error trace, emails generated from separate routine):

package app_test

import (
    "fmt"
    "net/smtp"
    "testing"
    "time"

    smtpmock "github.com/mocktools/go-smtp-mock/v2"
    "github.com/stretchr/testify/assert"
)

func TestServer(t *testing.T) {
    hostName := "localhost"
    server := smtpmock.New(smtpmock.ConfigurationAttr{
        LogToStdout:       true,
        LogServerActivity: true,
        HostAddress:       hostName,
    })

    if err := server.Start(); err != nil {
        fmt.Println(err)
    }

    go smtp.SendMail(
        fmt.Sprintf("%s:%d", hostName, server.PortNumber()),
        smtp.PlainAuth("id", "user", "password", hostName),
        "test@test.com",
        []string{"dest@dest.com"},
        []byte("some msg"),
    )

    time.Sleep(1 * time.Second)
    assert.Equal(t, true, len(server.Messages()) == 1, server.Messages())

    if err := server.Stop(); err != nil {
        fmt.Println(err)
    }
}
benjamin-rood commented 1 year ago

Hi guys, I'm on holiday with my young family on the other side of the world until Feb 1st, so I have no way of debugging this or writing the code to change anything, sorry, but I can provide my best guess at the solution.

I have run into something like this myself with concurrent access to a single server instance. I think there is a mistake (by me) in the server.Messages implementation.

https://github.com/mocktools/go-smtp-mock/blob/87efa37169545d7887bf6f2d8a12cbaae5e0bf41/server.go#L126

func (server *Server) Messages() []Message {
  server.Lock()
  defer server.Unlock()
  copiedMessages, messages := []Message{}, server.messages.items
  for index := range messages {
  copiedMessages = append(copiedMessages, *messages[index])
}

  return copiedMessages
}

I think we should try adding a call to the Mutex on server.messages - i.e. instead of locking the server, we should be locking server.messages.

Message ID: @.***>

bestwebua commented 1 year ago

@benjamin-rood Hello, Benjamin! Thanks for the super quick response. Have a perfect vacation for you and your family! I think that this issue can wait a bit. If I won't figure it out I'll ask your help after your vacation :)

P.S.: I have tried change locks from Server to messages - the same result with race condition...

func (server *Server) Messages() []Message {
    server.messages.Lock()         // locking messages instead Server
    defer server.messages.Unlock() // unlocking messages instead Server
    copiedMessages, messages := []Message{}, server.messages.items
    for index := range messages {
        copiedMessages = append(copiedMessages, *messages[index])
    }

    return copiedMessages
}
benjamin-rood commented 1 year ago

Okay. Then we need to go deeper: we either need to implement a read/write blocking QUEUE for all calls which interact with the messages (with an unbuffered channel), or change from using a slice of messages to a concurrent-safe data structure, like sync/map and treat it like a sequential list (where the map indexes are 0, 1, 2, ..., n).

On Sat, 7 Jan 2023, 09:00 Vladislav Trotsenko, @.***> wrote:

@benjamin-rood https://github.com/benjamin-rood Hello, Benjamin! Thanks for the super quick response. Have a perfect vacation for you and your family! I think that this issue can wait a bit. If I won't figure it out I'll ask your help after your vacation :)

P.S.: I have tried change locks from Server to messages - the same result with race condition...

func (server Server) Messages() []Message { server.messages.Lock() // locking messages instead Server defer server.messages.Unlock() // unlocking messages instead Server copiedMessages, messages := []Message{}, server.messages.items for index := range messages { copiedMessages = append(copiedMessages, messages[index]) }

return copiedMessages }

— Reply to this email directly, view it on GitHub https://github.com/mocktools/go-smtp-mock/issues/150#issuecomment-1374064713, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACNCK5GEAHPJFJOGSKG3BFDWRB2WBANCNFSM6AAAAAATTAVZGQ . You are receiving this because you were mentioned.Message ID: @.***>

benjamin-rood commented 1 year ago

Let me explain why the above is necessary: the messages array is effectively a simple in-memory database, and we are asking it to have the same (or similar) API to a proper database with the appropriate concurrency guarantees. As we know, in reality a database is not just blobs of data organised in a structured manner, it is also the means by which that data may be accessed or manipulated safely via read or write transactions that are queued to preserve data integrity.

The change I am proposing is not a breaking one, because the API will not change. The Messages() method would still return a slice of all message data, sending a new messages to the server would still work exactly the same way, etc. Only the underlying implementation with respect to message storage changes. The good news is that this is quite a fun thing to implement!

bestwebua commented 1 year ago

@benjamin-rood Agree! I'm researching this topic right now ❤️

benjamin-rood commented 1 year ago

@bestwebua Sorry, been slammed with work and life since I got back from holiday. I want to get stuck into fixing this if you haven't already done so?

bestwebua commented 1 year ago

@benjamin-rood Hi, Benjamin! It's sad, I also had a lot of work to do. All I managed to do it's updates about auto releasing of this package. Haven't actually done this bug fix yet and would appreciate your input!

oxer-0 commented 9 months ago

This issue made us switch to a combination of Testcontainers and Inbucket,

Used to use go-smtp-mock in our project, but we decided we wanna go Event-Driven, and sending emails became a responsibility of our events, Long story short, Using events added a very small delay and caused this data race break our tests.

Adding a 3 seconds sleep before everytime we read smtpmock.Server.Messages() fixed the problem, but it seemed a bit dirty and hacky, and also would slow down our tests

mitar commented 6 months ago

Please fix this.

mitar commented 6 months ago

I made https://github.com/mocktools/go-smtp-mock/pull/173 to fix this.

Then we need to go deeper: we either need to implement a read/write blocking QUEUE for all calls which interact with the messages (with an unbuffered channel),

We do not need to do an explicit blocking queue. Having a mutex creates an implicit queue on the data structure. Simply all data reads and writes to a shared data structure (in this case messages slices) has to be protected by a mutex.

bestwebua commented 6 months ago

@mitar Does this update fix data race condition described above? Does it appear on the same test env with same run conditions without your fix?

mitar commented 6 months ago

Does this update fix data race condition described above?

Yes.

Does it appear on the same test env with same run conditions without your fix?

A very similar test (effectively the same). But I can also test the exact one if you want.

bestwebua commented 6 months ago

@mitar If you have a time check it with exact one, please!

mitar commented 6 months ago

OK. Let me test that out.

mitar commented 6 months ago

OK, I added the test above to tests in the repository. That was a good call because I had to rework the fix. I updated PR accordingly.

bestwebua commented 6 months ago

In the latest release! Thanks @mitar for your involvement!