pemistahl / lingua-go

The most accurate natural language detection library for Go, suitable for short text and mixed-language text
Apache License 2.0
1.19k stars 66 forks source link

panic: runtime error: slice bounds out of range [:10] with length 9 #41

Closed Rom888 closed 1 year ago

Rom888 commented 1 year ago

When I run the code from this example: https://github.com/pemistahl/lingua-go#96-detection-of-multiple-languages-in-mixed-language-texts

go run . test.txt

I got this error, if the text has only one word:

English 0 10 :
panic: runtime error: slice bounds out of range [:10] with length 9

goroutine 1 [running]:
main.main()
    /home/rom/w/kube/apps/tts/split/split-text.go:49 +0x3ce
exit status 2

How to reproduce: cat test.txt

testword

cat ./split-text.go

package main

import (
  "fmt"
  "github.com/pemistahl/lingua-go"
  "os"
)

func getFileContent(filename string) string {
    testData, err := os.ReadFile(filename)
    if err != nil {
        panic(err.Error())
    }
    return string(testData)
}

func main() {
  if len(os.Args) < 2 {
    fmt.Println("Missing parameter, provide file name!")
    return
  }
  filename := os.Args[1]

  languages := []lingua.Language{
    lingua.English,
    lingua.Finnish,
  }

  detector := lingua.NewLanguageDetectorBuilder().
    FromLanguages(languages...).
    Build()

  sentence := getFileContent(filename)
  for _, result := range detector.DetectMultipleLanguagesOf(sentence) {
      fmt.Printf("%s %d %d :\n", result.Language(), result.StartIndex(), result.EndIndex())
      fmt.Printf("%s: '%s'\n", result.Language(), sentence[result.StartIndex():result.EndIndex()])
  }
}
pemistahl commented 1 year ago

Hello, thank you for reaching out. However, I'm not able to reproduce this bug.

My code:

package main

import (
  "fmt"
  "github.com/pemistahl/lingua-go"
)

func main() {
  detector := lingua.NewLanguageDetectorBuilder().
    FromLanguages(lingua.English, lingua.Finnish).
    Build()

  sentence := "testword"

  for _, result := range detector.DetectMultipleLanguagesOf(sentence) {
    fmt.Printf("%s %d %d :\n", result.Language(), result.StartIndex(), result.EndIndex())
    fmt.Printf("%s: '%s'\n", result.Language(), sentence[result.StartIndex():result.EndIndex()])
  }
}

Output:

English 0 8 :
English: 'testword'

Perhaps you are doing something wrong while reading the file? I have not tested that. Please verify.

Rom888 commented 1 year ago

Thanks for reply. If I run exactly your code, I get:

English 0 9 :
panic: runtime error: slice bounds out of range [:9] with length 8

goroutine 1 [running]:
main.main()
    /home/r/w/kube/apps/tts/split/split-text.go:17 +0x2d5
exit status 2

but if I change sentence to:

sentence := "Parlez-vous français? " + "Ich spreche Französisch nur ein bisschen. " + "A little bit is better than nothing."

all right:

English 0 102 :
English: 'Parlez-vous français? Ich spreche Französisch nur ein bisschen. A little bit is better than nothing.'

my go.mod:

module rom/split-text

go 1.20

require github.com/pemistahl/lingua-go v1.3.3

require (
    github.com/shopspring/decimal v1.3.1 // indirect
    golang.org/x/exp v0.0.0-20221106115401-f9659909a136 // indirect
    google.golang.org/protobuf v1.28.1 // indirect
)

go version:

go version go1.20.4 linux/amd64

run:

go run .
pemistahl commented 1 year ago

Hah, it turns out that I fixed this bug two months ago in the main branch but then forgot about it. Please update Lingua to 1.3.4 and the error will be gone.