ynqa / wego

Word Embeddings in Go!
Apache License 2.0
474 stars 41 forks source link

Unsafe Cbow channel #56

Closed wujunfeng1 closed 2 years ago

wujunfeng1 commented 3 years ago

Overview

Current implementation of Cbow use a channel of type []float64 for data exchange, but every trainOne function call needs to send and receive two []float64 vectors. Such communications are non-atomic. When multi-threads race at this channel, a pair of such vectors could be separated by another pair of vectors from other threads.

Abnormal Behaviors of Software Caused by This Issue

The Cbow training Example of this package could freeze due to this issue. It suspends during training and cannot proceed.

Actually, all Cbow trainings using this package could freeze.

Locations of Problematic Code

pkg/model/word2vec/model.go, line 80 ~ 83 (the data structure):

type cbow struct {
    ch     chan []float64
    window int
}

pkg/model/word2vec/model.go, line 103 ~ 107 (the go routine):

    agg, tmp := <-mod.ch, <-mod.ch
    defer func() {
        mod.ch <- agg
        mod.ch <- tmp
    }()

See, these sends and receives are non-atomic.

Possible Solutions to Overcome this Issue

We could define an auxiliary struct to avoid such race conditions, such as this:

type cbowToken struct {
    agg []float64
    tmp []float64
}

Revise the data structure of Cbow as:

type cbow struct {
        ch    chan cbowToken
        window int
}

Then in the go routine:

    token := <-mod.ch
    agg, tmp := token.agg, token.tmp
    defer func() {
        token := cbowToken{agg, tmp}
        mod.ch <- token
    }()

In this way, all sends and receives at this channel is atomic. Race conditions are eliminated.

ynqa commented 2 years ago

Fixed in https://github.com/ynqa/wego/pull/58. Thanks, @wujunfeng1.