sayef / fsner

Few-shot Named Entity Recognition
119 stars 6 forks source link

Multiple classes in same query sentence #3

Open pikaduck opened 2 years ago

pikaduck commented 2 years ago

Can the model only predict one entity from one sentence of query or is it possible to predict multiple entities in the same sentence? I want to classify tokens as EntityName and EntityValue. Should the same sentences with varying markings of [E][/E] be used as various sentences in list of supports? For example, `supports = [["[E] Name [/E] : Sakshi [E] Age [/E] : 15", "[E] Gender [/E] : Female [E] Work Address [/E] : abcd"], ["Name : [E] Sakshi [/E] Age : [E] 15 [/E]", "Gender : [E] Female [/E] Work Address : [E] abcd [/E]"]

I have CoNLL format of data. if you could give a heads up on how I could proceed with that, it would be really good. ]`

sayef commented 2 years ago

Hi @pikaduck,

Prediction is based on the pair of supports and query. You could repeat each of your query n (number of entity types) times for getting predictions for all types.

Consider the example in README file. You can do something like this to work on multiple class prediction.

W_supports = tokenizer.tokenize(supports).to(device)
n = len(supports)

for q in query:
    W_query = tokenizer.tokenize([q]*n).to(device)
    start_prob, end_prob = model(W_query, W_supports)
    output = tokenizer.extract_entity_from_scores(query, W_query, start_prob, end_prob, thresh=0.50)

And regarding the example you put, it's not how the library expects the input. Each support example must be labelled exactly once, as in one pair of [E] and [/E]. You might need to convert the CoNLL format to the format provided in the README.

pikaduck commented 2 years ago

I'm converting the CoNLL into sentences and even replicating the same query n times to match the size of supports. So each support can have only one entity tag, is that what you're saying? So would you suggest using the same sentence in supports as many times as there are number of entities in it?

On Wed, 20 Oct 2021, 2:04 am Md Saiful Islam Sayef, < @.***> wrote:

Hi @pikaduck https://github.com/pikaduck,

Prediction is based on the pair of supports and query. You could repeat each of your query n (number of entity types) times for getting predictions for all types.

Consider the example in README file. You can do something like this to work on multiple class prediction.

W_supports = tokenizer.tokenize(supports).to(device) n = len(supports)

for q in query: W_query = tokenizer.tokenize([q]*n).to(device) start_prob, end_prob = model(W_query, W_supports) output = tokenizer.extract_entity_from_scores(query, W_query, start_prob, end_prob, thresh=0.50)

And regarding the example you put, it's not how the library expects the input. Each support example must be labelled exactly once, as in one pair of [E] and [/E]. You might need to convert the CoNLL format to the format provided in the README.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sayef/fsner/issues/3#issuecomment-947084497, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHBSSTACCMREPVY3YPPENGTUHXI7BANCNFSM5GGQN7TA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

sayef commented 2 years ago

Yes to both of your questions.

The model was trained in such way that each support contains exactly one such label. For other type, repeat the sentence in other list labelling other entity.

Hope it helps!