jakewilliami / CodingTheory.jl

Pure Julia implementation of tools used in Coding Theory
MIT License
5 stars 1 forks source link

Refine code-word finding capabilities #56

Open jakewilliami opened 3 years ago

jakewilliami commented 3 years ago

1 Edited and refined in 02d911c, 2f6e6c5, 3fff9d6, and 6546206. 2 The current plan is to have a kind of mutating code. The problem with the random "algorithm" (as it were) is to balance it with the first point in this issue; we do not want to load the universe into memory. See below. 3 After the bad performance with the memory efficient iterator method, we should work as much on performance (regarding time) as we can. This does not mean we need to change the algorithm, it just means we need to be careful. Edited and refined in cebcd76, a5b8d4a, 77ee159, 37f6886, 70dbc29, and 267f501. See below.

jakewilliami commented 3 years ago

Playing around with random search implementation

Attempt One

No better than the old one, as it ignores the first point in this issue.

Base.rand(𝒰::UniverseParameters) = rand(𝒰.Σ)

Base.rand(𝒰::UniverseParameters, C::AbstractArray) = rand.((C, 𝒰.Σ, 1:𝒰.n))

function replace_if_allowed!(C::AbstractArray, d::Integer, w, w′)
    for c in C
        if hamming_distance(c, w′) < d
            return false
        end
    end

    replace!(C, w => w′)
    return true
end

replace_if_allowed(C::AbstractArray, d::Integer, w, w′) =
    replace_if_allowed!(copy(C), d, w, w′)

function mutate_codeword(w::Union{Tuple{T}, NTuple{N, T}}, n::Integer, i::Integer, a::T) where {N, T}
    w̲ = collect(w)
    w̲[i] = a

    return ntuple(j -> w̲[j], n)
end

C = get_codewords_greedy(𝒰, d)

for _ in 1:length(C)
    # set/reset counter
    j = 0
    # try to mutate the code so that the new word fits (try up to m times)
    for _ in 1:m
        # choose a random word, letter, and position in the word
        w, a, i = rand(𝒰, C)
        # mutate a copy of the word
        w′ = mutate_codeword(w, 𝒰.n, i, a)
        # try to alter the code
        replace_if_allowed!(C, d, w, w′) && break
        # increment counter
        j += 1
        # if no viable option is found for a mutation, remove the word from the code.
        isequal(j, m) && filter!(e -> e ≠ w, C)
    end
end

return C

Attempt Two

Garden of forking paths.

# choose random starting point
init_arr = rand(𝒰.Σ, 𝒰.n)
init_word = ntuple(j -> init_arr[j], length(init_arr))
# initialise array with random word
C = [init_word]
# iterate through words
for w in 𝒰
    C′ = copy(C)
    # find a word that will fit
    for w′ in 𝒰
        push_if_allowed!(C′, w′, d)
    end
    next_w = rand(𝒰.Σ, 𝒰.n)
    C′ = nothing

    push!(C, next_w)
end

return C

Attempt Three

As we were previously, but the only random is the start.

# choose random starting point
init_arr = rand(𝒰.Σ, 𝒰.n)
init_word = ntuple(j -> init_arr[j], length(init_arr))
# initialise array with random word
C = [init_word]
# iterate through words
for w in 𝒰
    push_if_allowed!(C′, w′, d)
end

return C

Attempt Four

I found this solution entirely by accident, while I was following up an outdated lead for what I thought might have been a solution (Random.randperm). Thank God the people over at JuliaCollections put the hard work into this one... A strange problem it was indeed. The only limitation here I can think of is another dependency (namely, IterTools).

C = Tuple[]
for _ in 1:length(𝒰)
    # https://github.com/JuliaCollections/IterTools.jl/blob/master/src/IterTools.jl#L610-L689
    push_if_allowed!(C, nth(𝒰, rand(1:length(𝒰))), d)
end

return C

Note

I suspect I know which one we will use (the cleanest one that is using pre-built functions as part of JuliaCollections, I suspect). However, I would like to request @dmipeck to review the current options, as he was the one who suggested mutating.

Using any of these methods is still considerably slower than the memory inefficient version. For example, consider the benchmarks of (respectively) the original and the fourth implementation of the random algorithms:

julia> @btime get_codewords_random(["a", "b", "c"], 4, 2); # original
  86.998 μs (1988 allocations: 97.67 KiB)

julia> @btime get_codewords_random(["a", "b", "c"], 4, 3); # "memory efficient"
  12.237 ms (89790 allocations: 2.45 MiB)

julia> @btime get_codewords_random(["a", "b", "c"], 6, 3); # original
  4.225 ms (130948 allocations: 8.01 MiB)

julia> @btime get_codewords_random(["a", "b", "c"], 6, 3); # "memory efficient"
  1.188 s (8633499 allocations: 272.82 MiB)

julia> @btime get_codewords_random(["a", "b", "c"], 8, 3); # original
  377.728 ms (10969724 allocations: 836.81 MiB)

julia> @btime get_codewords_random(["a", "b", "c"], 8, 3); # "memory efficient"
  100.025 s (738611869 allocations: 25.82 GiB)
jakewilliami commented 3 years ago
$ ./test/benchmark.jl # original
  3.527 s (54961978 allocations: 3.89 GiB)

$ ./test/benchmark.jl # after changing abstract_types.jl
  3.454 s (53511456 allocations: 3.85 GiB)

$ ./test/benchmark.jl # after changing algebra.jl
  3.342 s (53767559 allocations: 3.85 GiB)

$ ./test/benchmark.jl # after changing distance.jl
  2.580 s (31648208 allocations: 2.64 GiB)

$ # only minor changes to levenshtein.jl

$ ./test/benchmark.jl # after changing messages.jl, and in the process adding a Word struct and refining abstract_types.jl for said type
  1.124 s (10795017 allocations: 485.51 MiB)

$ ./test/benchmark.jl # after changes to rref.jl and utils.jl
  1.129 s (10767247 allocations: 483.70 MiB)