pkoukk / tiktoken-go

go version of tiktoken
MIT License
601 stars 67 forks source link

sortedTokenBytes doesn't seem necessary #38

Open stoneflying opened 10 months ago

stoneflying commented 10 months ago

Thank you for your efforts. I found that in the NewCoreBPE function, the result of the following code does not seem to be used anywhere,

        sortedTokenBytes := make([][]byte, 0, len(encoder))
    for k := range encoder {
        sortedTokenBytes = append(sortedTokenBytes, []byte(k))
    }
    sort.Slice(sortedTokenBytes, func(i, j int) bool {
        return bytes.Compare(sortedTokenBytes[i], sortedTokenBytes[j]) < 0
    })

    return &CoreBPE{
        ......
        sortedTokenBytes:     sortedTokenBytes,
    }, nil

but this sorting operation seems to be very expensive. Is there any consideration here?

pkoukk commented 9 months ago

Yes, this code currently is not necessary. This variable is mainly used when multi-threaded encoding is needed, but this feature has not been implemented yet. I will implement it later, maybe this week.