shaobohou / pearley

A Probabilistic Earley Parser
zlib License
5 stars 0 forks source link

EXTENSION OF PROBABILISTIC EARLEY PARSER #2

Closed ghost closed 7 years ago

ghost commented 7 years ago

Can you help me to realise an explicit implementation of the method for the generation of the subsequent prefixes for a given input string? That is, for every symbol of the input string, the method must returning a list of possible subsequent symbols with their probability generated by initial grammar. For example:

Grammar Rules 0 = S --> NP VP [1] 1 = NP --> Det N [1] 2 = VP --> VT NP [0.5] 3 = VP --> VI PP [0.5] 4 = PP --> P NP [1] 5 = Det --> a [1] 6 = N --> circle [0.333333] 7 = N --> square [0.333333] 8 = N --> triangle [0.333333] 9 = VT --> touches [1] 10 = VI --> is [1] 11 = P --> above [0.5] 12 = P --> below [0.5]

Input string a circle touches a triangle

The method returns:

1) List("a") = {("circle", 0.333333), ("square", 0.333333), ("triangle", 0.333333)}, where 0.333333 is a probability of "a"

2) List("circle") = {("circle", 0.333333)}, generates itself

3) List("touches") = {("touches", 0.333333)}, generates itself

4) List("a") = like 1)

5) List("triangle") = {("triangle", 0.333333)}, generates itself.

Thanks a lot in advance!

P. S. your works in github are wonderful!!

Sincerely

Dan

shaobohou commented 7 years ago

I don't entirely understand what you are trying to achieve. Why does "circle", "touches" and "triangle" generates itself?

Do you have a reference or a link to a writeup that explains it?

ghost commented 7 years ago

I wanted to say that "touches" doesn't generate anything like consecutive symbol. For example, it the input string is:

"A circle touches a triangle"

The symbols that follow "a" can be circle/triangle/square (everyone belong to input grammar).

I want a method that generates a list that, for every symbol of input string, contains possible symbols, with their probability, and can follow the considered symbol of the input string.

Hope I had explain it in the best words I could.

For the symbol "a" of the input string, the method must return:

List("a") = {(circle, 0.333), (square, 0.333), (triangle, 0.333)} where 0.333 is each probability

Thanks!

Il 30/nov/2016 19:21, Shaobo Hou notifications@github.com ha scritto:

I don't entirely understand what you are trying to achieve. Why does "circle", "touches" and "triangle" generates itself?

Do you have a reference or a link to a writeup that explains it?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/shaobohou/pearley/issues/2#issuecomment-263951911, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AP8flyqYGaBKUPm6T-BYwMtnueCbBxdIks5rDb6sgaJpZM4K972v.

shaobohou commented 7 years ago

But why doesn't "touches" generate any consecutive symbols?

Why does only "a" generate a non-empty list of consecutive symbols?

ghost commented 7 years ago

Because "touches" hasn't an alternative verbs by our initial grammar. I say that "touches" doesn't generate anything because the only verb of our grammar is it, there aren't any production about verb. Insted, about "a", it generate a list in which there are the symbols that CAN follow "a". The idea is: Starting from input string, I want to verify, at first, if it is generated by initial grammar. This step is that you've done in your code. The second step regards the creation of a function (method) that gives an indication of the possible symbols that can follow every symbol of the input string.

Il 01/dic/2016 12:51, Shaobo Hou notifications@github.com ha scritto:

But why doesn't "touches"generate any consecutive symbols?

Why does only "a" generate a non-empty list of consecutive symbols?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/shaobohou/pearley/issues/2#issuecomment-264152863, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AP8fl24yN7zFGGW7J6XdPTPPVsbiNiLKks5rDrTSgaJpZM4K972v.

shaobohou commented 7 years ago

I am afraid I still don't understand.

You say "touches" doesn't generate anything because it is the only verb in the grammar, but by the same logic "a" is the only Det in the grammar so it shouldn't generate anything either.

If "a" can generate a list of followup symbols because the there are production rules involving Det, then the same is also true for "touches" as there are production rules involving VT.

ghost commented 7 years ago

For example, if my grammar is:

S _ 1.0 S --> NP VP 1.0 PP --> P NP 1.0 VP --> V NP 1.0 VP --> VP PP 1.0 P --> with 1.0 V --> saw 1.0 NP --> NP PP 1.0 NP --> N 1.0 N --> astronomers 1.0 N --> ears 1.0 N --> stars 1.0 N --> telescopes

and my input string is:

astronomers saw stars with ears

At first I check if this input is generated by grammar. It’s ok because you code does exactly it. As second step, I would have a method that generate probability of consecutive prefix for a given input string. My “method” must return a result like this:

“astronomers” —> {(saw, 1)} the prefix that can follow symbol “astronomers” is only “saw” with probability “1” (100%)

“saw” —> {(astronomers, 0.25), (ears, 0.25), (stars, 0.25), (telescopes, 0.25)} each of these prefix can follow “saw”, every one with probability of 0.25 (25%)

I hear that the possible prefixes that can follow the precedent symbol are in a list with two argument: (possible prefix, probability prefix). It’s a sort of prediction, or a method that computes/predicts the possible symbol that follow a given symbol of input string. It seems like a prediction state of Earley Parser. Again, for example.

Everything according to our initial grammar. For symbol that can’t predict, isn't generate a list.

Thanks!

Il giorno 01 dic 2016, alle ore 14:25, Shaobo Hou notifications@github.com<mailto:notifications@github.com> ha scritto:

I am afraid I still don't understand.

You say "touches" doesn't generate anything because it is the only verb in the grammar, but by the same logic "a" is the only Det in the grammar so it shouldn't generate anything either.

If "a" can generate a list of followup symbols because the there are production rules involving Det, then the same is also true for "touches" as there are production rules involving VT.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/shaobohou/pearley/issues/2#issuecomment-264172382, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AP8fl5iddF6pUQ1h1YPyCWn13rMuAabtks5rDsrDgaJpZM4K972v.

shaobohou commented 7 years ago

I think you can probably achieve this by examining the predicted states:

https://github.com/shaobohou/pearley/blob/master/src/EarleyParser.h#L259

For each state, you can ask what the next symbol is:

https://github.com/shaobohou/pearley/blob/master/src/EarleyState.h#L31

And if that symbol is one of the terminal symbol then it can be predicted:

https://github.com/shaobohou/pearley/blob/master/src/EarleyParser.h#L49

The probability of each predicted symbol can probably be computed by normalising their "alpha" probabilities:

https://github.com/shaobohou/pearley/blob/master/src/EarleyState.h#L24

ghost commented 7 years ago

Thank you so much!

Il giorno 05 dic 2016, alle ore 15:15, Shaobo Hou notifications@github.com<mailto:notifications@github.com> ha scritto:

I think you can probably achieve this by examining the predicted states:

https://github.com/shaobohou/pearley/blob/master/src/EarleyParser.h#L259

For each state, you can ask what the next symbol is:

https://github.com/shaobohou/pearley/blob/master/src/EarleyState.h#L31

And if that symbol is one of the terminal symbol then it can be predicted:

https://github.com/shaobohou/pearley/blob/master/src/EarleyParser.h#L49

The probability of each predicted symbol can probably be computed by normalising their "alpha" probabilities:

https://github.com/shaobohou/pearley/blob/master/src/EarleyState.h#L24

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/shaobohou/pearley/issues/2#issuecomment-264864224, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AP8fl_ilUJL1vordzSMLnhZwyYdf0kq4ks5rFByAgaJpZM4K972v.