smarty-archives / mafsa

Package mafsa implements Minimal Acyclic Finite State Automata in Go, essentially a high-speed, memory-efficient, Unicode-friendly set of strings.
https://godoc.org/github.com/smartystreets/mafsa
Other
295 stars 25 forks source link

Impossible to use utf #12

Open pahanini opened 7 years ago

pahanini commented 7 years ago

It seems something wrong with encoding or decoding of utf. Consider this test:

import (
    "github.com/smartystreets/mafsa"
    "github.com/stretchr/testify/require"
    "testing"
)

func TestRussian(t *testing.T) {
    a := mafsa.New()
    a.Insert("я") // I'm in russian
    a.Finish()
    a.Save("test")
    require.True(t, a.Contains("я")) // Fine

    b, err := mafsa.Load("test")
    require.NoError(t, err)
    require.True(t, b.Contains("я")) // Failure!!
}

func TestSpanish(t *testing.T) {
    a := mafsa.New()
    a.Insert("Gracías")
    a.Finish()
    a.Save("test")
    require.True(t, a.Contains("Gracías")) // Thank you in spanish

    b, err := mafsa.Load("test")
    require.NoError(t, err)
    require.True(t, b.Contains("Gracías")) // Failure!!
}
mdwhatcott commented 7 years ago

@mholt - Does mafsa support non-ascii characters?

pahanini commented 7 years ago

@mholt Not sure but I did find any alphabet limitations here. It works with non-ascii but it seems an error during load/save process. Look at the test it works fine with buildTree and fails only after I load it from file and start to use minTree

pahanini commented 7 years ago

It seems merging this PR will solve this issue https://github.com/smartystreets/mafsa/pull/8