syonfox / gptoken

Javascript BPE Encoder Decoder for GPT-2 / GPT-3
MIT License
24 stars 1 forks source link

Ts compatibility and next.js #2

Closed tg44 closed 1 year ago

tg44 commented 1 year ago

Hy!

First of all, thanks for the devtime!

I have three problems;

Encoder.js vs encoder.js

There are multiple modules with names that only differ in casing.
This can lead to unexpected behavior when compiling on a filesystem with other case-semantic.
Use equal casing. Compare these module identifiers:
* /Users/x/dev/x/node_modules/@syonfox/gpt-3-encoder/Encoder.js
    Used by 2 module(s), i. e.
    /Users/x/dev/x/node_modules/@syonfox/gpt-3-encoder/index.js
* /Users/x/dev/x/node_modules/@syonfox/gpt-3-encoder/encoder.js
    Used by 2 module(s), i. e.
    /Users/x/dev/x/node_modules/@syonfox/gpt-3-encoder/Encoder.js

module

TS2306: File '/Users/x/dev/x/node_modules/@syonfox/gpt-3-encoder/index.d.ts' is not a module.

When I import it like this;

import { countTokens, tokenStats } from "@syonfox/gpt-3-encoder";

I think we should declare the module as;

declare module "@syonfox/gpt-3-encoder" {

But I have no experience in lib developement.

token detection in browser

For some reason, if the code runs on the frontend with nextjs I don't get proper token resolves, so my tokenStat values are totally useless :(

syonfox commented 1 year ago

One theory is make sure its utf8 encoded strings ...

Does it work as "gpt-3-encoder"

I wonder if there is a way of declaring many names @syonfox/gpt-3-encoder GPToken gpt-3-encoder

anyways looking into it a bit since I'm playing with next.js

syonfox commented 1 year ago

Ok so i have not had to much luck with this but have decided to name this for gptoken

JUst for sanity.

CHeck out the GPToken branch or yarn install gptoken

THen it should work IF you want i made a demo_app folder that has some starting code in it and mainly includes the project properly

cd demo_app; npm install; npm start;

tg44 commented 1 year ago

Nice! I just changed "@syonfox/gpt-3-encoder": "^1.4.0-rc5", to "gptoken": "^0.0.1", (and fixed the imports) and now I have token statistics!

Also, the encoder.js vs Encoder.js error is resolved, and the module name is correct! Thanks for your work!