thibaultcha / lua-resty-jit-uuid

Fast and dependency-free UUID library for LuaJIT/ngx_lua
http://thibaultcha.github.io/lua-resty-jit-uuid/
MIT License
206 stars 40 forks source link

uuid 生成冲突 #17

Closed lengrongfu closed 5 years ago

lengrongfu commented 5 years ago

在docker容器中,基础镜像为ubuntu,在很短的时间内出现生成uuid冲突。

thibaultcha commented 5 years ago

Hi,

I do not speak Chinese, so apologies if I misunderstood your question. If you are having UUID conflicts, that is most likely because your workers' PRNG have identical seeds. Please make sure that you follow the recommendations on seeding the PRNG appropriately. Each NGINX worker needs to have a different PRNG seed.

This library comes with a seed() function that is a suggestion for healthy PRNG seeding practices, but in some cases it might not be enough. Particularly in the containerized world, where several containers may spawn workers with identical PIDs. In such cases, I would recommend seeding your PRNG from /dev/urandom instead, as we do in Kong (see here).

thibaultcha commented 5 years ago

@lengrongfu Did you resolve your issue?

lengrongfu commented 5 years ago

yes. thanks!

dorongutman commented 5 years ago

@thibaultcha I'm suffering from the duplicate seed generation due to the docker environment in which nginx is always pid 1, so I tried implementing the logic from the Kong link you put in the response, and I got the code in the bottom of this comment. Specifically, I added 2 functions (_M.get_rand_bytes and random_string), and changed a bit the _M.seed function to utilize it in case the pid is 1. Since I'm not a lua coder, could you please have a look and tell me if it makes sense ? Thank you.


-- vim:set ts=4 sts=4 sw=4 et:

--- jit-uuid
-- Fast and dependency-free UUID library for LuaJIT/ngx_lua.
-- @module jit-uuid
-- @author Thibault Charbonnier
-- @license MIT
-- @release 0.0.7

local bit = require 'bit'

local tohex = bit.tohex
local band = bit.band
local bor = bit.bor

local _M = {
    _VERSION = '0.0.7'
}

----------
-- seeding
----------

-- try to get n_bytes of CSPRNG data, first via /dev/urandom,
-- and then falling back to OpenSSL if necessary
function _M.get_rand_bytes(n_bytes, urandom)
  local buf = ffi_new(bytes_buf_t, n_bytes)
  ffi_fill(buf, n_bytes, 0x0)

  -- only read from urandom if we were explicitly asked
  if urandom then
    local rc = urandom_bytes(buf, n_bytes)

    -- if the read of urandom was successful, we returned true
    -- and buf is filled with our bytes, so return it as a string
    if rc then
      return ffi_str(buf, n_bytes)
    end
  end

  if C.RAND_bytes(buf, n_bytes) == 0 then
    -- get error code
    local err_code = C.ERR_get_error()
    if err_code == 0 then
      return nil, "could not get SSL error code from the queue"
    end

    -- get human-readable error string
    C.ERR_load_crypto_strings()
    local err = C.ERR_reason_error_string(err_code)
    C.ERR_free_strings()

    return nil, "could not get random bytes (" ..
                "reason:" .. ffi_str(err) .. ") "
  end

  return ffi_str(buf, n_bytes)
end

do
  local char = string.char
  local rand = math.random
  local encode_base64 = ngx.encode_base64

  -- generate a random-looking string by retrieving a chunk of bytes and
  -- replacing non-alphanumeric characters with random alphanumeric replacements
  -- (we dont care about deriving these bytes securely)
  -- this serves to attempt to maintain some backward compatibility with the
  -- previous implementation (stripping a UUID of its hyphens), while significantly
  -- expanding the size of the keyspace.
  local function random_string()
    -- get 24 bytes, which will return a 32 char string after encoding
    -- this is done in attempt to maintain backwards compatibility as
    -- much as possible while improving the strength of this function
    return encode_base64(get_rand_bytes(24, true))
           :gsub("/", char(rand(48, 57)))  -- 0 - 10
           :gsub("+", char(rand(65, 90)))  -- A - Z
           :gsub("=", char(rand(97, 122))) -- a - z
  end

  _M.random_string = random_string
end

--- Seed the random number generator.
-- Under the hood, this function calls `math.randomseed`.
-- It makes sure to use the most appropriate seeding technique for
-- the current environment, guaranteeing a unique seed.
--
-- To guarantee unique UUIDs, you must have correctly seeded
-- the Lua pseudo-random generator (with `math.randomseed`).
-- You are free to seed it any way you want, but this function
-- can do it for you if you'd like, with some added guarantees.
--
-- @param[type=number] seed (Optional) A seed to use. If none given, will
-- generate one trying to use the most appropriate technique.
-- @treturn number `seed`: the seed given to `math.randomseed`.
-- @usage
-- local uuid = require 'resty.jit-uuid'
-- uuid.seed()
--
-- -- in ngx_lua, seed in the init_worker context:
-- init_worker_by_lua {
--   local uuid = require 'resty.jit-uuid'
--   uuid.seed()
-- }
function _M.seed(seed)
    if not seed then
        if ngx then
            if ngx.worker.pid() == 1 then
                seed = _M.random_string()
            else
                seed = ngx.time() + ngx.worker.pid()
            end
        elseif package.loaded['socket'] and package.loaded['socket'].gettime then
            seed = package.loaded['socket'].gettime()*10000

        else
            seed = os.time()
        end
    end

    math.randomseed(seed)

    return seed
end

-------------
-- validation
-------------

do
    if ngx and string.find(ngx.config.nginx_configure(),'--with-pcre-jit',nil,true) then
        local type = type
        local re_find = ngx.re.find
        local regex = '^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$'

        --- Validate a string as a UUID.
        -- To be considered valid, a UUID must be given in its canonical
        -- form (hexadecimal digits including the hyphen characters).
        -- This function validates UUIDs disregarding their generation algorithm,
        -- and in a case-insensitive manner, but checks the variant field.
        --
        -- Use JIT PCRE if available in OpenResty or fallbacks on Lua patterns.
        --
        -- @param[type=string] str String to verify.
        -- @treturn boolean `valid`: true if valid UUID, false otherwise.
        -- @usage
        -- local uuid = require 'resty.jit-uuid'
        --
        -- uuid.is_valid 'cbb297c0-a956-486d-ad1d-f9bZZZZZZZZZ' --> false
        -- uuid.is_valid 'cbb297c0-a956-486d-dd1d-f9b42df9465a' --> false (invalid variant)
        -- uuid.is_valid 'cbb297c0a956486dad1df9b42df9465a'     --> false (no dashes)
        -- uuid.is_valid 'cbb297c0-a956-486d-ad1d-f9b42df9465a' --> true
        function _M.is_valid(str)
            -- it has proven itself efficient to first check the length with an
            -- evenly distributed set of valid and invalid uuid lengths.
            if type(str) ~= 'string' or #str ~= 36 then
                return false
            end

            return re_find(str, regex, 'ioj') ~= nil
        end

    else
        local match = string.match
        local d = '[0-9a-fA-F]'
        local p = '^' .. table.concat({
            d:rep(8),
            d:rep(4),
            d:rep(4),
            '[89ab]' .. d:rep(3),
            d:rep(12)
        }, '%-') .. '$'

        function _M.is_valid(str)
            if type(str) ~= 'string' or #str ~= 36 then
                return false
            end

            return match(str, p) ~= nil
        end
    end
end

----------------
-- v4 generation
----------------

do
    local fmt = string.format
    local random = math.random

    --- Generate a v4 UUID.
    -- v4 UUIDs are created from randomly generated numbers.
    --
    -- @treturn string `uuid`: a v4 (randomly generated) UUID.
    -- @usage
    -- local uuid = require 'resty.jit-uuid'
    --
    -- local u1 = uuid()             ---> __call metamethod
    -- local u2 = uuid.generate_v4()
    function _M.generate_v4()
        return (fmt('%s%s%s%s-%s%s-%s%s-%s%s-%s%s%s%s%s%s',
                    tohex(random(0, 255), 2),
                    tohex(random(0, 255), 2),
                    tohex(random(0, 255), 2),
                    tohex(random(0, 255), 2),

                    tohex(random(0, 255), 2),
                    tohex(random(0, 255), 2),

                    tohex(bor(band(random(0, 255), 0x0F), 0x40), 2),
                    tohex(random(0, 255), 2),

                    tohex(bor(band(random(0, 255), 0x3F), 0x80), 2),
                    tohex(random(0, 255), 2),

                    tohex(random(0, 255), 2),
                    tohex(random(0, 255), 2),
                    tohex(random(0, 255), 2),
                    tohex(random(0, 255), 2),
                    tohex(random(0, 255), 2),
                    tohex(random(0, 255), 2)))
    end
end

----------------
-- v3/v5 generation
----------------

do
    if ngx then
        local ffi = require 'ffi'

        local tonumber = tonumber
        local assert   = assert
        local error    = error
        local concat   = table.concat
        local type     = type
        local char     = string.char
        local fmt      = string.format
        local sub      = string.sub
        local gmatch   = ngx.re.gmatch
        local sha1_bin = ngx.sha1_bin
        local md5      = ngx.md5
        local C        = ffi.C
        local ffi_new  = ffi.new
        local ffi_str  = ffi.string
        local ffi_cast = ffi.cast
        local new_tab
        do
            local ok
            ok, new_tab = pcall(require, 'table.new')
            if not ok then
                new_tab = function(narr, nrec) return {} end
            end
        end

        ffi.cdef [[
            typedef unsigned char u_char;
            typedef intptr_t ngx_int_t;

            u_char * ngx_hex_dump(u_char *dst, const u_char *src, size_t len);
            ngx_int_t ngx_hextoi(u_char *line, size_t n);
        ]]

        local str_type    = ffi.typeof('uint8_t[?]')
        local u_char_type = ffi.typeof('u_char *')

        local function bin_tohex(s)
            local slen = #s
            local blen = slen * 2
            local buf = ffi_new(str_type, blen)

            C.ngx_hex_dump(buf, s, slen)

            return ffi_str(buf, blen)
        end

        local function hex_to_i(s)
            local buf = ffi_cast(u_char_type, s)

            local n = tonumber(C.ngx_hextoi(buf, #s))
            if n == -1 then
                error("could not convert hex to number")
            end

            return n
        end

        local buf = new_tab(16, 0)

        local function factory(namespace, hash_fn)
            if not _M.is_valid(namespace) then
                return nil, 'namespace must be a valid UUID'
            end

            local i = 0
            local iter, err = gmatch(namespace, [[([\da-f][\da-f])]])
            if not iter then
                return nil, 'could not create iter: ' .. err
            end

            while true do
                local m, err = iter()
                if err then
                    return nil, err
                end

                if not m then
                    break
                end

                i = i + 1
                buf[i] = char(tonumber(m[0], 16))
            end

            assert(i == 16, "invalid binary namespace buffer length")
            local ns = concat(buf)

            return function(name)
                if type(name) ~= 'string' then
                    return nil, 'name must be a string'
                end

                local hash, ver, var = hash_fn(ns, name)

                return (fmt('%s-%s-%s%s-%s%s-%s', sub(hash, 1, 8),
                                                sub(hash, 9, 12),
                                                ver,
                                                sub(hash, 15, 16),
                                                var,
                                                sub(hash, 19, 20),
                                                sub(hash, 21, 32)))
            end
        end

        local function v3_hash(binary, name)
            local hash = md5(binary .. name)

            return hash,
            tohex(bor(band(hex_to_i(sub(hash, 13, 14)), 0x0F), 0x30), 2),
            tohex(bor(band(hex_to_i(sub(hash, 17, 18)), 0x3F), 0x80), 2)
        end

        local function v5_hash(binary, name)
            local hash = bin_tohex(sha1_bin(binary .. name))

            return hash,
            tohex(bor(band(hex_to_i(sub(hash, 13, 14)), 0x0F), 0x50), 2),
            tohex(bor(band(hex_to_i(sub(hash, 17, 18)), 0x3F), 0x80), 2)
        end

        --- Instanciate a v3 UUID factory.
        -- @function factory_v3
        -- Creates a closure generating namespaced v3 UUIDs.
        -- @param[type=string] namespace (must be a valid UUID according to `is_valid`)
        -- @treturn function `factory`: a v3 UUID generator.
        -- @treturn string `err`: a string describing an error
        -- @usage
        -- local uuid = require 'resty.jit-uuid'
        --
        -- local fact = assert(uuid.factory_v3('e6ebd542-06ae-11e6-8e82-bba81706b27d'))
        --
        -- local u1 = fact('hello')
        -- ---> 3db7a435-8c56-359d-a563-1b69e6802c78
        --
        -- local u2 = fact('foobar')
        -- ---> e8d3eeba-7723-3b72-bbc5-8f598afa6773
        function _M.factory_v3(namespace)
            return factory(namespace, v3_hash)
        end

        --- Instanciate a v5 UUID factory.
        -- @function factory_v5
        -- Creates a closure generating namespaced v5 UUIDs.
        -- @param[type=string] namespace (must be a valid UUID according to `is_valid`)
        -- @treturn function `factory`: a v5 UUID generator.
        -- @treturn string `err`: a string describing an error
        -- @usage
        -- local uuid = require 'resty.jit-uuid'
        --
        -- local fact = assert(uuid.factory_v5('e6ebd542-06ae-11e6-8e82-bba81706b27d'))
        --
        -- local u1 = fact('hello')
        -- ---> 4850816f-1658-5890-8bfd-1ed14251f1f0
        --
        -- local u2 = fact('foobar')
        -- ---> c9be99fc-326b-5066-bdba-dcd31a6d01ab
        function _M.factory_v5(namespace)
            return factory(namespace, v5_hash)
        end

        --- Generate a v3 UUID.
        -- v3 UUIDs are created from a namespace and a name (a UUID and a string).
        -- The same name and namespace result in the same UUID. The same name and
        -- different namespaces result in different UUIDs, and vice-versa.
        -- The resulting UUID is derived using MD5 hashing.
        --
        -- This is a sugar function which instanciates a short-lived v3 UUID factory.
        -- It is an expensive operation, and intensive generation using the same
        -- namespaces should prefer allocating their own long-lived factory with
        -- `factory_v3`.
        --
        -- @param[type=string] namespace (must be a valid UUID according to `is_valid`)
        -- @param[type=string] name
        -- @treturn string `uuid`: a v3 (namespaced) UUID.
        -- @treturn string `err`: a string describing an error
        -- @usage
        -- local uuid = require 'resty.jit-uuid'
        --
        -- local u = uuid.generate_v3('e6ebd542-06ae-11e6-8e82-bba81706b27d', 'hello')
        -- ---> 3db7a435-8c56-359d-a563-1b69e6802c78
        function _M.generate_v3(namespace, name)
            local fact, err = _M.factory_v3(namespace)
            if not fact then
                return nil, err
            end

            return fact(name)
        end

        --- Generate a v5 UUID.
        -- v5 UUIDs are created from a namespace and a name (a UUID and a string).
        -- The same name and namespace result in the same UUID. The same name and
        -- different namespaces result in different UUIDs, and vice-versa.
        -- The resulting UUID is derived using SHA-1 hashing.
        --
        -- This is a sugar function which instanciates a short-lived v5 UUID factory.
        -- It is an expensive operation, and intensive generation using the same
        -- namespaces should prefer allocating their own long-lived factory with
        -- `factory_v5`.
        --
        -- @param[type=string] namespace (must be a valid UUID according to `is_valid`)
        -- @param[type=string] name
        -- @treturn string `uuid`: a v5 (namespaced) UUID.
        -- @treturn string `err`: a string describing an error
        -- @usage
        -- local uuid = require 'resty.jit-uuid'
        --
        -- local u = uuid.generate_v5('e6ebd542-06ae-11e6-8e82-bba81706b27d', 'hello')
        -- ---> 4850816f-1658-5890-8bfd-1ed14251f1f0
        function _M.generate_v5(namespace, name)
            local fact, err = _M.factory_v5(namespace)
            if not fact then
                return nil, err
            end

            return fact(name)
        end

    else
        function _M.factory_v3() error('v3 UUID generation only supported in ngx_lua', 2) end
        function _M.generate_v3() error('v3 UUID generation only supported in ngx_lua', 2) end
        function _M.factory_v5() error('v5 UUID generation only supported in ngx_lua', 2) end
        function _M.generate_v5() error('v5 UUID generation only supported in ngx_lua', 2) end
    end
end

return setmetatable(_M, {
    __call = _M.generate_v4
})
thibaultcha commented 5 years ago

@dorongutman Hi there,

You must seed each worker process individually (i.e. call seed() in init_worker_by_lua*). Since each worker process will be forked from the master process (whose PID is 1), workers PIDs won't be 1. Most likely 2, 3, etc... You must thus ensure that _M.random_string() is always called. It's fine to not call seed(), but math.randomseed(random_string()) in init_worker (this library's seed() method is just a helper around math.randomseed()). In Kong, we went as far as completely overriding math.randomseed. The reason being that other modules loaded in your application could be calling math.randomseed inappropriately and overriding a good PRNG seed with a duplicated seed (e.g. workers entering a library's code calling math.randomseed(os.time()) after your have already initialized their seed...). It's just an extra safety precaution.

I have proposed OpenResty patches some time ago to remedy to the math.randomseed() pitfalls, but those were not well received at the time. I might give it another go someday.

Hope that helps, let me know if not. Also, please feel free to open another issue next time.

dorongutman commented 5 years ago

@thibaultcha I created a new issue (#19) as you asked to continue the talk. Specifically about where I call seed and again trying to understand whether the changes to the code are correct.