segment-any-text / wtpsplit

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.
MIT License
625 stars 36 forks source link

Huggingface AutoModelForTokenClassification bug #112

Closed asusdisciple closed 5 months ago

asusdisciple commented 6 months ago

If you load "canine s12l no adapters" with the AutoModelForTokenClassification class and from_pretrained method with hugginface you get an KeyError: "la-canine". I looked for a key in configuration_auto.py and only found "canine". Should be a quick fix.

bminixhofer commented 6 months ago

Hi!

la-canine is a custom model class registered by wtpsplit.

Try importing wtpsplit first (like here: https://github.com/bminixhofer/wtpsplit#advanced-usage), that should solve it.

ohyooo commented 6 months ago

Hi!

la-canine is a custom model class registered by wtpsplit.

Try importing wtpsplit first (like here: https://github.com/bminixhofer/wtpsplit#advanced-usage), that should solve it.

I've imported wtpsplit. but still got KeyError: "la-canine".

AutoConfig.register("bert-char", BertCharConfig) is registered

ohyooo commented 6 months ago

i know..

import wtpsplit.models

bminixhofer commented 5 months ago

Good catch. Updated in the readme and closing this!