networkservicemesh / sdk

Apache License 2.0
35 stars 35 forks source link

Too high probability of name collision in getNameFromConnection #1589

Open olahg opened 4 months ago

olahg commented 4 months ago

We are using 4 interfaces in a pod; and the names generated are like this: lb-fe-a.tr-626b. We expect the probability of a name collision to be very low. Currently, it is 0.00018, which is kind of low, but not very low. After thousands of runs we do get collisions. Soon, our use case will demand that we'll have many more interfaces, in which case this probability becomes too high, e.g., for 30 interfaces the collision probability is about 0.013 (above 1 percent).

Our understanding is that names are generated here: https://github.com/networkservicemesh/sdk/blob/main/pkg/networkservice/common/mechanisms/kernel/utils.go#L30

The collision probability could be lowered by either having a longer random suffix or using a wider alphabet for the random suffix or both. E.g., the name could be something like lb-fe-a.tr3Q@xn

NikitaSkrynnik commented 2 months ago

Hi, @olahg. This Issue has been fixed in this PR: https://github.com/networkservicemesh/sdk/pull/1590. Now to get 1% chance of interface name collision you need to generate about 68 billon names. I'm closing this issue, but you can always reopen it if you have any questions.

olahg commented 3 weeks ago

Unfortunately, https://github.com/networkservicemesh/sdk/pull/1590 created its own problems. Although it reduced the name collision probability, it also broke some functionality. Some problems are: (1) in update situations where both old and new nsm versions are an play, we can have a mixed bag of old and new conventions. (2) Looking at nsm-<random-garbage> makes it impossible to see what the interfaces correspond to and this makes debugging very difficult. (3) Our users have code that relies on the old naming convention (e.g., lb-fe-a.tr-626b) (4) Meridio itself also relies on the old naming convention.

A solution would be to retain the old naming convention (e.g., lb-fe-a.tr-<4-character-random-garbage>) where <4-character-random-garbage> would use the 90-character alphabet introduced in https://github.com/networkservicemesh/sdk/pull/1590 instead of the old 16-letter alphabet. This would give us about 1000 times more possible random names, which would make the name collision risk acceptable for us.

edwarnicke commented 1 week ago

@olahg Which 90-character alphabet? I ask, because I get a bit nervous when we start using punctuation :)

olahg commented 1 week ago

The alphabet defined here: https://github.com/networkservicemesh/sdk/blob/e3eed82dc1b43925260139cd61063504aa5d8aad/pkg/tools/nanoid/generator.go#L29

edwarnicke commented 1 week ago

@olahg I'd be very cautious about the punctuation marks... very high probability to cause problems, and absolute certainty to be a usability nightmare for cli users. Would [a-z][A-Z][0-9] (size 62) be sufficient?

denis-tingaikin commented 6 days ago

Hello, @edwarnicke, @olahg 

I found that the old version of the iface ID generation is about 10 times worse than the current solution.

Test proof:


func limitName(name string) string {
    if len(name) > kernelmech.LinuxIfMaxLength {
        return name[:kernelmech.LinuxIfMaxLength]
    }
    return name
}

func getNameFromConnection(ns string) string {
    nsMaxLength := kernelmech.LinuxIfMaxLength - 5
    if len(ns) > nsMaxLength {
        ns = ns[:nsMaxLength]
    }
    name := fmt.Sprintf("%s-%s", ns, uuid.New().String())
    return limitName(name)
}

func TestNoCollisions_Old(t *testing.T) {
    used := make(map[string]bool)
    for i := 0; i < 50*1000; i++ {
        id := getNameFromConnection("lb-fe-a.tr")
        _, ok := used[id]
        require.False(t, ok, "Collision detected for id: %s, %d", id, i)
        used[id] = true
    }
}

func TestNoCollisions_New(t *testing.T) {
    used := make(map[string]bool)
    for i := 0; i < 50*1000; i++ {
        id, err := nanoid.GenerateString(4)
        require.NoError(t, err)
        id = "lb-fe-a.tr-" + id
        _, ok := used[id]
        require.False(t, ok, "Collision detected for id: %s, %d", id, i)
        used[id] = true
    }
}

Now we're using the nsm${id} format for iface naming, which I believe looks hard to use. My suggestion is to return the old format, i.e., ${ns-name}-${id}.  This will allow you to migrate from v1.11.2 to v1.13.2 and reduce collision chances. The next steps in reducing collision chances can be continued in a separate issue.

Thoughts?

olahg commented 4 days ago

@olahg I'd be very cautious about the punctuation marks... very high probability to cause problems, and absolute certainty to be a usability nightmare for cli users. Would [a-z][A-Z][0-9] (size 62) be sufficient?

Yes, this seems to be sufficient.

denis-tingaikin commented 4 days ago

@edwarnicke, @olahg 

I just tested this alphabet 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz with the current solution, and the difference between the old and new solutions is only 2 times by collision probability (current solution is better).

So I think that we should go with https://github.com/networkservicemesh/sdk/pull/1644 because it will allow to update to the latest NSM sdk.

olahg commented 4 days ago

@denis-tingaikin, I'm not sure what is meant by "10 times better" or "2 times better". Can you clarify?

denis-tingaikin commented 4 days ago

@olahg Sure.

When I used alphabet that you liked

@olahg I'd be very cautious about the punctuation marks... very high probability to cause problems, and absolute certainty to be a usability nightmare for cli users. Would [a-z][A-Z][0-9] (size 62) be sufficient?

Yes, this seems to be sufficient.

It's 2 times better by collision probability. (This means, that a collision is less likely compared to the previous solution.)

With current alphabet

The alphabet defined here: https://github.com/networkservicemesh/sdk/blob/e3eed82dc1b43925260139cd61063504aa5d8aad/pkg/tools/nanoid/generator.go#L29

It's 10 times better by collision probability. (This means, that a collision is less likely compared to the previous solution.)

denis-tingaikin commented 4 days ago

I also added comments in PR https://github.com/networkservicemesh/sdk/pull/1644 to clarify this point.