Prioritizing Entities. - Githubissues

Hyperclaw79 commented 6 years ago

Do you want to request a feature, report a bug, or ask a question about wit? Feature

What is the current behavior? No priority or ranking among entities.

What is the expected behavior? Scope to assign priorities to entities such that there is a better control over the detection.

Hello HT,Thanks for reaching out. Can you post on Github? https://github.com/wit-ai/wit/issuesIt will help keep track of the request and maybe other Wit developers will be able to guide you. We don't support entities prioritization so I'm afraid it won't be possible. Maybe a workaround would be to use 2 different Wit apps and have you code send it to the 2nd wit app (with data) only if the first app (with song) was not detected Laurent

@l5t In response to your mail: I cannot use 2 wit apps because I'm using this wit app as the custom app for Facebook's built-in NLP.

patapizza commented 6 years ago

Hi @Hyperclaw79,

Can you provide a little bit more context on what you are trying to do?

Hyperclaw79 commented 6 years ago

@patapizza Here's the mail I sent to help@wit.ai:


On Wed, Jan 3, 2018 at 12:36 AM, "Hyperclaw79" <harshith.thota7@gmail.com> wrote: 
First of all, I'd like to thank you for this wonderful open source solution for NLP. I am in love with it.
I am developing a WittyMusicBot which fetches song information. 
And I have a question: Is it possible to assign priorities to an entity? (I don't mean the roles.) 

Small example of my use case:

Let's say I have two inputs:

Information for Numb by Linkin Park
Details about Castle of Glass

In the first case, the detection would be as such:

"Numb by Linkin Park" -> data
"Numb" -> song (composite under data)
"Linkin Park" -> artist (composite under data)
In the second case, the detection would be as such:

"Castle of Glass" -> song
Currently, even the first input gets detected as song instead of as data. So, I would like to make sure that it gets detected as a song only if it is not detected as data beforehand.

I created the data set of song before data and it is lengthy and requires editing all the previously erroneous cases before data was introduced.
So, if a simple ranking method could solve this instead of manually editing my song's dataset, it would be appreciated.

If there is a way to achieve this apart from exhaustive training, please help me figure it out.
Thank you in advance. Hoping for a helpful reply.

~HT

blandinw commented 6 years ago

@Hyperclaw79 why are you using composite entities here, as opposed to an artist entity and a song entity?

The data entity you're proposing seems very general and will probably be hard to train, e.g. you want "play [Castle of Glass]" to be song, but "play [Castle of Eminem]" to be data (Eminem made a song called Castle).

You can use the API to update your app programmatically and avoid time-consuming manual modifications.

Closing for now, feel free to comment/reopen if I missed something.

Hyperclaw79 commented 6 years ago

@blandinw

The data entity you're proposing seems very general

It isn't as general as it looks. The data entity gets triggered only when both song and the artist are present and using by (or 's) as a keyword in between them.

The data entity you're proposing seems very general and will probably be hard to train, e.g. you want "play [Castle of Glass]" to be song, but "play [Castle of Eminem]" to be data (Eminem made a song called Castle).

Also, I think you misunderstood which I can tell from your example. Compare it with this: "play Castle of Glass" will be detected as song while "play Castle of Glass by Eminem" will be detected as data within which Castle of Glass will be detected as song and Eminem as artist under the data entity.

You can use the API to update your app programmatically

Like I've mentioned before, I am using this as built-in NLP for Facebook, so I can't control it from my end. For now I am using workarounds to detect the by keyword upon receiving user message but this messes up the detection of song and artist when they comprise of multiple words along with irrelevant words like "look for", "details", etc. This is the whole reason why I want to use a trainable NLP.

blandinw commented 6 years ago

Several things here:

1/ I still have the same understanding after reading your comment. What is "play Castle of Eminem" supposed to return? My guess would be that you want it to return "Castle" as a song and "Eminem" as an artist. The issue with that is that you would also like "play Castle of Glass" to return "Castle of Glass" as a song. This will be hard to train because "play Castle of Glass" is very similar (to someone or an algorithm that has no pre-existing knowledge of existing songs and artists). You'll probably have to enumerate all songs in your training samples.

I still don't understand the need for a composite entity here, why have the data entity at all?

2/ Are you using a custom token in Built-in NLP? If so, you can still use our HTTP API to make changes. Built-in NLP only does a GET /message call on your behalf, for convenience.

Hyperclaw79 commented 6 years ago

What is "play Castle of Eminem" supposed to return? My guess would be that you want it to return

"Castle" as a song and "Eminem" as an artist. No, I strictly plan to limit delimiters to by and 's. On the other hand, of will still be considered part of a song.

why have the data entity at all?

In the case when there is only a song, I want to send the song entity. In the case where there is song <by> artist I want it to be detected as data within which it can be further classified. The main reason to use data entity is to get hold of the phrase containing by in it, first. Then further split it to get song and artist. On the other hand, If I use only song, sometimes, even by will be taken in as a part of either the song or the artist which would be wrong.

Built-in NLP only does a GET /message call on your behalf, for convenience.

oh I didn't know about this! Thanks.

you can still use our HTTP API to make changes

except that I'm not able to make one simple POST requests to /samples without getting a generic error like something went wrong.

patapizza commented 6 years ago

Do you have an example of a POST /samples request that returns a 500 along with the app id? Thanks.

Hyperclaw79 commented 6 years ago

I'm using a python script to generate the url and send the request.

headers = {
    'Authorization':'Bearer P4PPKDLH7LUJJXN2YCOKNXYM37IJGKV5',
    'Content-Type': 'application/json'
}
with open('samples.json','w+') as f:
    f.write(generate_json(tupleList,queryList))
response = requests.post('https://api.wit.ai/samples?v={}'.format(datetime.date.today().strftime("%Y%m%d")),headers=headers,data=generate_json(tupleList,queryList))

And the sample json looks like this:

[
    {
        "text": "Give me details for Perfect by Ed Sheeran",
        "entities": {
            "data": [
                {
                    "entities": {
                        "song": [
                            {
                                "value": "Perfect",
                                "type": "value"
                            }
                        ],
                        "artist": [
                            {
                                "value": "Ed Sheeran",
                                "type": "value"
                            }
                        ]
                    },
                    "value": "Perfect by Ed Sheeran",
                    "type": "value"
                }
            ]
        }
    }]

patapizza commented 6 years ago

@Hyperclaw79 Please see the docs example. There is no data field. Composite entities need to be specified under subentities, not entities (this is not documented).

-edit-: I see from another issue that data is an entity. In this case it should look like:

{
 "entity":"data",
 "value":"Perfect by Ed Sheeran",
 "subentities":[
  {
    "entity":"song",
    "value":"Perfect"
  },
  {
    "entity":"artist",
    "value":"Ed Sheeran"
  }]
}

Note that you need to specify start and end for each non-trait entity. Composite entities indexes are relative to the entity above.

Hyperclaw79 commented 6 years ago

Ah thanks for this. will try it out.

(this is not documented)

well, that was the problem.

Note that you need to specify start and end for each non-trait entity.

Can you please give me an example of how to use start and end?

Edit: And this works. Thank you.

l5t commented 6 years ago

Example of start and end here: https://github.com/wit-ai/wit-api-only-tutorial#add-date-detection

Harshitharaj06 commented 4 years ago

How to obtain the details of all the entity once my order is complete.

wit-ai / wit

Prioritizing Entities. #902