urchade / GLiNER

Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024
https://arxiv.org/abs/2311.08526
Apache License 2.0
948 stars 82 forks source link

Returned entities don't provide information #138

Open Xiaomin-HUANG opened 5 days ago

Xiaomin-HUANG commented 5 days ago

Model version : "knowledgator/gliner-multitask-large-v0.5", "urchade/gliner_multi-v2.1",

Issue : I used those 2 models to detect ["name_surname", "email","organization", "phone_number"], but some returned entities didn't bring any useful information.

Examples :

              'phone_number': ['numéro', '75', '73', 'numéro de téléphone', 'numéro'] => (I only want the phone number, but not those letters)
              'name_surname': [  'madame', 'madame foucard','madame', 'mr' ....], => (I only want a person's name, but the "madame","mr" are appellation in conversation, they didn't bring any wanted info   )
              'email': ['mail', 'mail'] => (I want the email address in stead of label name )

PS : Those unwanted entities, which are similar to label names, have a high confident score ( like 0.95). So if there are any method to filter those undesired entities ? Thank you so much.

Ingvarstep commented 5 days ago

@Xiaomin-HUANG , I think this artifacts of dataset on which this models were trained, the best way to fix it - fine-tune your model. I would like you recommend this notebook. It contains Gradio interfaces to help you label your data. Considering your tasks the amount of required examples should be small.