Open ampli opened 4 years ago
Get a more detailed help on a variable as in "!help var".
linkparser> !bad
Display of bad linkages turned on.
linkparser> bird flu was observed in which countries ?
Found 2 linkages (0 had no P.P. violations)
Linkage 1 (bad), cost vector = (UNUSED=0 DIS= 0.20 LEN=12)
"Misuse of preposition13"
+-------------------------------Xp-------------------------------+
+--------------->WV--------------->+ |
+------>Wd------+ | +------Jp-----+ |
| +--AN--+---Ss--+----Pvf---+---MVp--+-JQ-+---Dmc--+ |
| | | | | | | | |
LEFT-WALL bird.n flu.n-u was.v-d observed.v-d in.r which countries.n ?
so there it is: "Misuse of preposition13"
Fixing this requires ... being clever. Usually by finding similar sentences that work, and stealing ideas from those. Simply disabling "Misuse of preposition13"
will just increase the number of failures in corpus-basic
.
Also -- if the proceedings have an e-mail, please do send them and email and remind them that more modern versions exist ...
bird flu was observed where?
bird flu was observed how?
bird flu was observed when?
when was bird flu observed?
In which countries was bird flu observed?
The first three fail completely; the last two work fine. The first three are "inverted questions" . Note how the last two use SI (inverted subject), which suggests that the first three need a new kind of link, maybe "QP" for "inverted question" Something like this:
----->+
| +---
Pvf---+---QP---+-JQ
| |
observed.v-d in.r wh
But then you have to invent something to prevent QP from being used to parse I saw in which room
. Hmmmm See https://www.abisource.com/projects/link-grammar/dict/section-JQ.html
Oh, OK, so then Pvf- & QP+
would work, it seems. That's because Pv
is used for "was verbed" constructions, which are valid for inverted questions, but would not allow "I saw in which". To make it even tighter, use Pvf- & (WV- or CV-) & QP+
so that the participle must be identified as the head-verb.
Maybe instead of inventing a new link QP, there is some existing link we can reuse. Not sure, would have to review the documentation. It's likely that a new link might be needed, since questions are ... very different than normal sentences,and also LG is weaker with questions.
(above comment edited)
LG is weaker with questions
This is pity, since people try to use it for decoding queries.
Also -- if the proceedings have an e-mail, please do send them and email and remind them that more modern versions exist ...
BTW, about 2 weeks ago I sent a letter on a similar thing to Prof. Ahn, who very recently (Oct 2019) published this paper on a system in which LG is used: A Function as a Service Based Fog Robotic System for Cognitive Robots. (No answer yet.)
Its a pity
Do you want to try fixing it, or should I?
Do you want to try fixing it, or should I?
I tried just to add MVp
in the Misuse of preposition13
rule and on first glance it looks fine:
--- a/data/en/4.0.dict
+++ b/data/en/4.0.dict
in.r:
<alter-preps>
or ({JQ+} & (J+ or Mgp+ or IN+) & (<prep-main-a> or FM-))
or K-
or (EN- & (Pp- or J-))
or <locative>
or [MVp- & B-]
or (MG- & JG+)
- or <null-prep-qu>;
+ or <null-prep-qu>
+ or (MVp- & JQ+ & J+);
--- a/data/en/4.0.knowledge
+++ b/data/en/4.0.knowledge
- JQ , Mj Wj MX#j , "Misuse of preposition13" ,
+ JQ , Mj Wj MX#j MVp , "Misuse of preposition13" ,
It didn't change the number of errors in corpus-basic
.
In corpus-fixes
it reduced the number of errors from 379 to 373, when these sentences are now parsed:
Sophy wondered up to what number she should count
Sophy wondered up to what number to count
Sophy wondered up to what number to count to
Sophy wondered up to whose favorite number she should count
Sophy wondered up to whose favorite number to count
Sophy wondered up to whose favorite number to count to
Since they don't include in.r
, this is only due to the addition of MVp
in the said PP rule.
Summary of errors by corpus:
corpus | now | patched | diff | linkage-limit |
---|---|---|---|---|
basic | 82 | 82 | 0 | 1000 |
fixes | 379 | 373 | -6 | 1000 |
fix-long | 9 | 9 | 0 | 10000 |
failures | 1556 | 1555 | -1 | 1000 |
pandp-union | 2016 | 2007 | -9 | 1000 |
pandp-union | 1998 | 1990 | -8 | 30000 |
With the long-sentences batches I just tried -limit=30000
. The pandp-union
corpus processing then takes much time and maybe a lower value would be enough (I have more to say about that...).
The difference between the number of "fixed" sentences in pandp-union
seems to be due to a different number of "combinatorial explosions" due to the changed rules (but I'm not sure - we can fine the different sentence and investigate it).
So based on these checks maybe this change is fine. However, I guess you will want to investigate:
corpus-fixes
.Minor editing of my previous message (table diff value + missing open parenthesize).
bird flu was observed where? bird flu was observed how? bird flu was observed when?
I tried to fix them by brute force, by adding what seems to be a missingQI+
when Pvf-
is present, as hinted by:
linkparser>
Linkage 2, cost vector = (UNUSED=1 DIS= 0.20 LEN=9)
+------------------------Xp-----------------------+
+--------------->WV--------------->+ |
+------>Wd------+ | |
| +--AN--+---Ss--+----Pvf---+ |
| | | | | |
LEFT-WALL bird.n flu.n-u was.v-d observed.v-d [where] ?
Press RETURN for the next linkage.
linkparser>
Linkage 3, cost vector = (UNUSED=1 DIS= 1.10 LEN=10)
+----------------------Xp---------------------+
+-------------->WV-------------->+ |
+------>Wd------+ | |
| +--AN--+-------Ss-------+---QI---+ |
| | | | | |
LEFT-WALL bird.n flu.n-u [was] observed.v-d where ?
Instead of just adding QI+
, I added the macro in which it resides. I have no idea if this is better.
predicted.v-d realized.v-d discovered.v-d determined.v-d announced.v-d
mentioned.v-d admitted.v-d recalled.v-d revealed.v-d divulged.v-d
stated.v-d observed.v-d indicated.v-d stammered.v-d bawled.v-d
analysed.v-d analyzed.v-d
assessed.v-d established.v-d evaluated.v-d examined.v-d questioned.v-d
tested.v-d hypothesized.v-d hypothesised.v-d well-established.v-d
envisaged.v-d documented.v-d:
((<verb-sp,pp> & (<vc-predict>)) or
(<verb-and-sp-i-> & ([<vc-predict>]0.2 or ())) or
((<vc-predict>) & <verb-and-sp-i+>) or
<verb-and-sp-t>)
- or (<verb-s-pv> & {THi+})
+ or (<verb-s-pv> & ({THi+} or <vc-predict>))
or <verb-adj>
or <verb-phrase-opener>;
The result is that these sentences get parsed, with no additional errors in the 5 tested corpus batches. E,g,:
+-----------------------Xp----------------------+
+--------------->WV--------------->+ |
+------>Wd------+ | |
| +--AN--+---Ss--+----Pvf---+---QI---+ |
| | | | | | |
LEFT-WALL bird.n flu.n-u was.v-d observed.v-d where ?
Supposing this is correct (I don't know), then still:
<vc-predict>
and not just part of it.I've been hacking on this, look at my branch "qi" I have not tested for regressions.
Regarding MVp, the page https://www.abisource.com/projects/link-grammar/dict/section-JQ.html gives the example: "*I saw in which room"
So pull req #1051 fixes this but I did not measure reqgressions. I'm also contemplating chaning Misuse of preposition14
so that "You slept with who?" will parse.
And .. in the finest of traditions, the changes to the dict mean that all run-times are now slower by 10% or 20% or something like that ... al of your performance tuning gets blown away by some fairly minor dict changes that one might think would not matter.
Perhaps it's wrong to think of them as "minor" -- {QI+} is now & with lots of common verbs: did said, and many many others. The total number of expressions is significatntly larger, the total number of disjuncts is larger. .. It would be interesting to look at these totals, and the distributions of them, for typical dictionaries, over time.
It would be much simpler, and also be interesting to see how dictionaries from different eras compare on performance, on the current parser.
I remeasured performance, correctly, this time; the performance hit is minor
I fetched your "qi" branch and made some tests.
Regarding MVp, the page https://www.abisource.com/projects/link-grammar/dict/section-JQ.html gives the example: "*I saw in which room"
The problem with my (and your) fix to bird flu was observed in which countries?
it that now the said example "*I saw in which room" does parse.
It seems to me that the root of the problem is that in the fix we threat "was observed" as "passive participles" i.e. a verb and then there is no way to distinguish the different cases (as "saw" is a verb too).
So I propose instead that the role of ""was observed" in that sentence is "predicate adjective" , and at this role its should use Pa & JQ & J+
.
I..e. something is predicated and on that basis we ask a question where
, when
, in which countries
etc.
This way in the Misuse of preposition13
rule we can require Pa
instead of Mvp
, and this Pa
should also be added to Misuse of preposition14
.
This proposal doesn't handle the "up to" sentences, so they remain unfixed. I think their fix is different, so we can discuss it later (unless it seems to you related).
To check this proposal O made this changes:
--- a/data/en/4.0.dict
+++ b/data/en/4.0.dict
predicted.v-d realized.v-d discovered.v-d determined.v-d announced.v-d
...
((<verb-sp,pp> & (<vc-predict>)) or
(<verb-and-sp-i-> & ([<vc-predict>]0.2 or ())) or
((<vc-predict>) & <verb-and-sp-i+>) or
<verb-and-sp-t>)
or (<verb-s-pv> & {THi+})
+ or (Pa- & (MVp+ or <vc-predict>))
or <verb-adj>
or <verb-phrase-opener>;
in.r:
<alter-preps>
or ({JQ+} & (J+ or Mgp+ or IN+) & (<prep-main-a> or FM-))
or K-
or (EN- & (Pp- or J-))
or <locative>
or [MVp- & B-]
or (MG- & JG+)
- or <null-prep-qu>;
+ or <null-prep-qu>
+ or (MVp- & JQ+ & J+);
--- a/data/en/4.0.knowledge
+++ b/data/en/4.0.knowledge
- JQ , Mj Wj MX#j , "Misuse of preposition13" ,
- Jw , Mj Wj MX#j , "Misuse of preposition14" ,
+ JQ , Mj Wj MX#j Pa , "Misuse of preposition13" ,
+ Jw , Mj Wj MX#j Pa , "Misuse of preposition14" ,
Results:
...
+-------------------------------Xp-------------------------------+
+---------->WV--------->+ |
+------>Wd------+ | +------Jp-----+ |
| +--AN--+---Ss--+----Pa----+---MVp--+-JQ-+---Dmc--+ |
| | | | | | | | |
LEFT-WALL bird.n flu.n-u was.v-d observed.v-d in.r which countries.n ?
...
+-----------------------Xp----------------------+
+---------->WV--------->+ |
+------>Wd------+ | |
| +--AN--+---Ss--+----Pa----+---QI---+ |
| | | | | | |
LEFT-WALL bird.n flu.n-u was.v-d observed.v-d where ?
And, as needed, "*I saw in which room" doesn't parse:
+---->WV--->+
+->Wd--+Sp*i+-MVp-+------Ju------+
| | | | |
LEFT-WALL I.p saw.w in.r [which] room.n-u
...
!bad
...
"Misuse of preposition13"
+---->WV--->+ +-----Js----+
+->Wd--+Sp*i+-MVp-+-JQ-+-Ds**c+
| | | | | |
LEFT-WALL I.p saw.w in.r which room.s
Corpus error count:
corpus | now | patched | diff | linkage-limit |
---|---|---|---|---|
basic | 82 | 82 | 0 | 1000 |
fixes | 379 | 379 | 0 | 1000 |
fix-long | 9 | 9 | 0 | 10000 |
failures | 1556 | 1554 | -2 | 1000 |
pandp-union | 2016 | 2011 | -5 | 1000 |
slower by 10% or 20% or something like that
Can it be that you tested it on intermediate changes? For me the slowness of your "qi" branch is only a very few percents at most. In any case I have a WIP on improving expression handling and also pruning (both expression and power) so this may allow increasing the dict complexity without much more overhead.
We can also look at that from another angle: Improving the library speed will allow a much more complex dict without being too sluggish.
I remeasured performance, correctly, this time; the performance hit is minor
Only now I see that you addressed that by now...
After you applied PR #1051, we get:
linkparser> Sophy wondered up to what number to count to
Found 28 linkages (28 had no P.P. violations)
Linkage 1, cost vector = (UNUSED=0 DIS= 6.00 LEN=14)
+-------->WV------->+-----MVp-----+-----J-----+---------B---------+
+-->Wd---+---Ss*s---+---MVa--+ +-JQ-+-Ds**c+---R--+--I--+--MVp-+
| | | | | | | | | |
LEFT-WALL Sophy.f wondered.v-d up.e to.r what number.n to.r count.v to.r
Among other things, this seems to me wrong:
Ss*s---+---MVa--+
| |
wondered.v-d up.e
Isn't up
a modifier of to
and not wondered
?
Compare the symmetric sentence in the context of reverse counting:
Sophy wondered down to what number to count to
Clearly down
here is not a verb modifier.
Can the problem be solved by attaching up
etc. to to
using Mj
?
BTW, I also don't think 'up to' here is an idiom, because instead of up
I can think of some other words
(down, approximately, nearly, exactly).
Compare to that:
linkparser> Sophy wondered right to which one she should stand on the stage
Found 218 linkages (8 had no P.P. violations)
Linkage 1, cost vector = (UNUSED=0 DIS= 0.53 LEN=34)
+-------------------------MVp-------------------------+
| +--------------------Mp--------------------+
| | +-------------CV------------>+ |
| | +------Cs-----+ | |
+------->CPx--------+ | +----Js---+ | | +---Js---+
+-->Wa---+ +---SIsj---+---Mj--+-JQ-+-Ds-+ +--Ss--+---I---+ | +Ds**c+
| | | | | | | | | | | | |
LEFT-WALL Sophy.f wondered.q-d right.n-u to.r which one she should.v stand.v on the stage.n
I saw in which room
This is actually ambiguous. In the surface, it seems like an absurd sentence, but it's a plausible reply to the question: "Did you see in which room they held bingo night?" Anyway, your proposal can be simplified to:
--- a/data/en/4.0.knowledge
+++ b/data/en/4.0.knowledge
@@ -217,8 +217,8 @@ CONTAINS_ONE_RULES:
Mj , Jw JQ , "Incorrect relative10" ,
MX#j , Jw JQ , "Incorrect relative11" ,
Wj , Jw JQ , "Misuse of preposition12" ,
- JQ , Mj Wj MX#j MVp , "Misuse of preposition13" ,
- Jw , Mj Wj MX#j , "Misuse of preposition14" ,
+ JQ , Mj Wj MX#j Pv , "Misuse of preposition13" ,
+ Jw , Mj Wj MX#j Pv , "Misuse of preposition14" ,
B#j , Jr , "Incorrect relative15" ,
Jr , B#j , "Incorrect relative16" ,
; The two below prevent "How big?" and "How quickly?"
Also, yes, the Sophy sentences are broken
I think up to
could be an idiom, here, because:
*Sophy wondered exactly to what number to count to
Sophy wondered exactly what number to count to
How high did it go?
Up to what mark did it reach?
Exactly what mark did it reach?
up to where did it go?
Up to how many gallons were lost?
down to which floor did it drop?
down to what depravities did he sink?
The above are easily fixed by up_to down_to: EW+;
A proper fix for the others requires link-crossing. This is best illustrated by pondering the sentence: "Sophy wondered [up to] whose favorite number she should count to" and then realizing that [up to] needs to modify "number" not "whose". Unfortunately, this is not possible without link-crossing.
There is a work-around for link-crossing, but it is hacky: I did it once, here: see Jj and Jk at bottom of page at https://www.abisource.com/projects/link-grammar/dict/section-J.html
Doing such a hack in the dozen-plus cases where it is needed is painful and ugly. I would rather be able to say "link X can cross link Y or Z once". I don't think two crossings are ever needed. I don't think that allowing anything to cross anything is generally allowed. The README has accumulated a bunch of these...
I would rather be able to say "link X can cross link Y or Z once".
Will it be fine to do the hack automatically on dict read according to such definitions?
Will it be fine to do the hack automatically on dict read according to such definitions?
Hmm. That's an intersting idea. Yeah, maybe I like it. So we need several things:
1) Some way to write down "link X can cross link Y" in the dictionary.
2) by analogy to Jj
and Jk
, your hack to auto generate Xj
and Xk
and then auto-add Xj- & Y & Xk+
Yeah, I like that. The tricky part to 2 is to put the subscripts in a slot which is unused. Maybe we could put them in the "first" slot, like h
and d
for head/dependent, except they're cross-from-left and cross-from right, so maybe l
and r
and the ascii diagrammer could use parents to print them! Like so:
+------------------+
| +--)|(--------+
| | | |
He had been allowed to eat a cake by Sophy that she had
so the parents make a little "tunnel" where the link crosses, and the logic for that would be just like drawing the arrow-heads for h
and d
arrows. So yeah, that seems slick...
To be clear: you would auto-add lX- & Y & rX+
...it might even be possible to do this with an m4 macro hack. Ugh.
The example in https://www.abisource.com/projects/link-grammar/dict/section-J.html is actually complex, because, there, the J link cross two others: it crosses both the I
and the VJlpi
links...
So for this example, Yikes... its yucky. In the current dict, its I- & Jj- & VJlpi- & VJrpi+ & Jk+
and so that's not obvious that J
is crossing I
and VJlpi
but is not crossing VJrpi+
... what a mess.
The render would be
+-------I---------+
| +--VJlpi----+
| | +-Js--)|(----------Js----------+
| +-MVp+ +--VJrpi-+--MVp-+---Js--+
| | | | | | |
... to.r look.v at and.j-v listen.v to.r everything
and the notation would be I- & lJs- & VJlpi- & VJrpi+ & rJs+
Instead of l
and r
maybe s
and r
because l
and 1
and I
all look alike too much. s
is for Latin sinister
.
Or p
and q
as a visually mirror-symmetric pair. Or w
and v
. Or e
and a
Hmm. except for p and q, it appears that the Latin alphabet was explicitly designed to avoid mirror-symmetric letters. Interesting. This is also the case for cyrillic and greek. ... interesting ...
When I look at !!and-j-v
, I see several other VJlx- & VJrx+
constructs, and even a very similar one to the one that has the Jj
Jk
device: ({Xd-} & hVJlpi- & {N+} & {TO+} & hVJrpi+)
.
Why they don't also need this device, especially the last one that also includes hVJlpi- & hVJrpi+
?
Another question: I don't like the complication of using the UC front position. Is there something bad in using a bool
mark in the Connector struct?
The Jj
- Jk
device is "recent" (well, OK maybe over a year old now) and is used in only one place (OK, now maybe two), and was created as an experiment to see how well it works (how convenient or confusing it is, how much trouble it causes vs. how much trouble it saves...) It was never deployed on a wide-scale basis. The post above (https://github.com/opencog/link-grammar/issues/1050#issuecomment-557770152) is the newest/best way I can think of of making it fully generic and "obvious".
the UC front position
I don't understand the question. The goal of fronting UC is to have a notation in 4.0.dict to indicate that "opposites connect". Maybe this could be moved so that it comes after the +/- connector-dir. Or allows additional symbols besides +/- ...
The goal of fronting UC is to have a notation in 4.0.dict to indicate that "opposites connect".
What is special about their matching rules?
For me it seems as if regular Js- and Js+ connectors are fine, and only the code that draws the diagram needs to know they denote a cross-link (and hence my suggested bool
mark).
and only the code that draws the diagram needs to know
And how will it know this? it's not just J that might cross, it could be .. A or S or a dozen others.
And how will it know this? it's not just J that might cross, it could be .. A or S or a dozen others.
My ideas is that these connectors (in your example A, S and others) that serve as "bypass" connectors will be marked in their connector struct. Questions: Why is there any need to explicitly make these marks in the connector string? Is there any code, beside the diagram drawing code, that needs to be aware that there is anything special here?
in their connector struct.
I'm not concerned with how they are handled in the C code. Finding a good representation for 4.0.dict is my primary concern.
Is there any code that needs to be aware
Presumably, "most" applications of LG are interested in the dependency diagram, in the abstract, as a graph, and now, as a non-planar graph. So there will need to be a step that says "a hah, here's something that in LG looks like two links, but its really only just one." The app itself could figure that out, or we could provide that extra step ourselves, in the LG api. In addition, the app might want to know which links cross.
The only real problem with this is that there are very few, approaching zero apps of LG, at least, that are public, that anyone talks about. Every now and then I get hints of proprietary apps, but they never seem heavily vested. So all this is very hypothetical.
Yes, other parts of opencog use LG, but ... not very well, not very robustly, not very deeply.
I'm not concerned with how they are handled in the C code. Finding a good representation for 4.0.dict is my primary concern.
But the whole idea is that the special "bypass" connectors don't appear at all in the in 4.0.dict, as they are only inserted later by the LG library code. So how they can be represented there?
What is to be representaed there is something like:
<XLINK>: Js+ & VJrpi-; % Connection from Js may cross the rest of connectors.
(For now it seems to me no need to specify the less deeper connectors too that it also would cross, unless this may lead to incorrect parses. I also don't know if an exact connector match should be done for VJrpi-
or "easy-match".)
If you mean to their representation in printing of the actual expression which is used (or its disjuncts) then it is really doesn't mater from programming standpoint, in which representation they are displayed, and indeed the most convenience representation should be used.
and the notation would be I- & lJs- & VJlpi- & VJrpi+ & rJs+ Instead of l and r maybe s and r because l and 1 and I all look alike too much. s is for Latin sinister. Or p and q as a visually mirror-symmetric pair. Or w and v . Or e and a
I still didn't understand if the LG library code should make any special interpretation of these leading LC letters (supposing it already knows these are "bypass" connectors - after all the LG library code knows what it added). For now it seems to me these special letters don't play any role in the connector matching algorithm (unlike h/d and the rest of the letters in the connector string), and even not in the drawing algorithm (since it is already known which connectors are the "bypass" ones).
I guess I'm not clear enough in my questions and proposals, or that I didn't understand something (or both). I will try to make a real implementation and see if it works fine, but still answers to the above would help.
What also would help me are additional diagrams of the desired results. E.g. for some of the "Sophy" sentences (only links from words that have cross-links are needed).
+------------------+ | +--)|(--------+ | | | | He had been allowed to eat a cake by Sophy that she had
When I try to draw this using the current parse, I get:
+-------------------Bs------------------+
| |
+----------Mvp---------)|(---+ |
| | | |
----->+------IV---->+----Os---+---)|(---R--------+---------CV-------->+----
-Pv---+---TO---+-I*t+ +Ds**c+-Mp-+-Js-+ +--Cr-+--Ss-+---PP---+--Ox
| | | | | | | | | | |
allowed.v-d to.r eat.v a cake.s by Sophy.f that.j-r she had.v-d made.v-d
I.e. 2 links crossings. What did I miss?
And this current linkage seems to need 3 link crossings:
Linkage 2, cost vector = (UNUSED=0 DIS= 0.50 LEN=39)
+------------------------------MVa---------------------
| +-------------------Bs------------------+
+------IV---->+----Os---+---------R--------+---------CV-------->+
-Pv---+---TO---+-I*t+ +Ds**c+-Mp-+-Js-+ +--Cr-+--Ss-+---PP---+--Ox
| | | | | | | | | | |
allowed.v-d to.r eat.v a cake.s by Sophy.f that.j-r she had.v-d made.v-d
Maybe this MVa, and another one from allowed
to specially
at linkage 13, are incorrect?
But the whole idea is that the special "bypass" connectors don't appear at all in the in 4.0.dict
Oww. I forgot that is what we're talking about. I have to run out now, but will re-read and rethink.
For now it seems to me no need to specify the less deeper connectors too that it also would cross, unless this may lead to incorrect parses.
Given that even our simplest examples seem to need to cross multiple links, this seems like a reasonable assumption. But really, we'll have to test and see.
I also don't know if an exact connector match should be done for VJrpi- or "easy-match".
It should be a regular match.
I still didn't understand if the LG library code should make any special interpretation of these leading LC letters. For now it seems to me these special letters don't play any role in the connector matching algorithm.
Sorry for generating confusion about this before. To answer this question directly: in the prototype, I used two different subtypes, Jj
and Jk
so that Jj
formed the left half of the underpass link, and Jk
formed the right half of the link. This seemed like the right way to do it; and I'm not sure what would have happened if I'd just used a single subtype, Jx
for example. I'll try a brief experiment now ...
... done. I collapsed Jj and Jk into just Jx, and nothing seemed to change, but I was sloppy, so might be wrong...
I.e. 2 links crossings. What did I miss?
Nothing; that's correct.
this current linkage seems to need 3 link crossings:
It seems to, but if we can block it from happening, that's fine, because its wrong/implausible.
I updated the above diagram with what seems the correct disjunct of allowed
:
+-------------------Bs------------------+
+-------------MVp------)|(---+ |
| +----Os---+---)|(---R--------+---------CV-------->+----
Pv---+---MVi--+--I-+ +Ds**c+-Mp-+-Js-+ +--Cr-+--Ss-+---PP---+--Ox
| | | | | | | | | | |
allowed.v-d to.r eat.v a cake.s by Sophy.f that.j-r she had.v-d made.v-d
But in any case there is a fundamental problem in the "bypass" connector idea here, as the word by
cannot be connected twice to the word cake
(Mp
and MVp
).
So I have no idea how to implement this and at the same time to preserve the link:
+-Mp-+
| |
cake.s by
Or maybe it is wrong and should be omitted in any case? See linkage 2.
Linkage 1, cost vector = (UNUSED=0 DIS=-0.51 LEN=17)
+------MVp-----+
+------------>WV------------>+------IV---->+----Os---+ |
+->Wd--+-Ss-+--PPf--+---Pv---+---TO---+-I*t+ +Ds**c+-Mp-+-Js-+
| | | | | | | | | | |
LEFT-WALL he had.v-d been.v allowed.v-d to.r eat.v a cake.s by Sophy.f
Press RETURN for the next linkage.
linkparser>
Linkage 2, cost vector = (UNUSED=0 DIS= 0.10 LEN=17)
+------MVp-----+
+------------>WV------------>+------IV---->+----Os---+ |
+->Wd--+-Ss-+--PPf--+---Pv---+---TO---+-I*t+ +Ds**c+ +-Js-+
| | | | | | | | | | |
LEFT-WALL he had.v-d been.v allowed.v-d to.r eat.v a cake.s by Sophy.f
preserve the link
Yes, there are some rare cases where one might like to have two different links connecting the same pair of words. This is one of them.
Or maybe it is wrong
It's not wrong, but it's also not exactly right. cake --Mp-- by Sophy
implies that Sophy made the cake (which later turns out to be true... but we don't know that yet); in the first half of this sentence, the correct parse is that its "allowed by Sophy": so really the correct parse has allowed --MV-- by Sophy
. So in this case, dropping the Mp
link is not wrong, and in fact, it's more correct to kill the Mp
link.
In my WIP, I started with:
<fxlink>: [MVp+]-1.65 & R+;
Comments:
-
for this example, but with cost 0).<fxlink>
to distinguish the macro from a regex label (so we can, for example, to write a dict analyzer that will, among other things, warn on regexp labels without regexes). Another name, or totally another label format (e.g. @fxlink
) can be used.<fxlink>.something: [MVp+]-1.65 & R+;
<fxlink>.anotherthing: [MVs+]-1.65 & R+;
This may help with error messages. However, for now I just allowed multiple <fxlink>
labels.<fxlink>.something: [MVp+ or MVs+]-1.65 & R+;
However, insertion of MVp
before any R+
causes 2x slowness on the long batches due to the added disjuncts (a big portion of them is not getting pruned). Since the observed results only include crossing of both R
and Bs
, Bsp
or Bsw
links, it seems this would be faster:
<fxlink>: [MVp+]-1.65 & R+ & Bs+;
(I haven't completed yet its implementation, which is much more complex than inserting before a single connector, so I don't have slowness-factor result yet, but I guess the result will be faster)
BTW, this idea of defining link crossing has a problem that it cannot force the regular connector match rules. E.g. if you would like (just for the example) MV
to cross, then you can get a bad link like ---MVp---)|(---MVs
. I don't know how to overcome that.
Another problem I had to overcome is preventing label crossing in the diagram. For example, in this kind of crossing
+-------------------Bs------------------+
+-------------MVp------)|(---+ |
| +----Os---+---)|(---R--------+---------CV-------->+----
Pv---+---MVi--+--I-+ +Ds**c+-Mp-+-Js-+ +--Cr-+--Ss-+---PP---+--Ox
| | | | | | | | | | |
allowed.v-d to.r eat.v a cake.s by Sophy.f that.j-r she had.v-d made.v-d
the label on a vertical cross link R
can be overwritten by `)|('. The label can be moded, but it is complex to ensure that there will be enough room for that. Instead, I chose another solution: Only allow crossing vertical lines, so the current printout in my WIP is:
+--------------------Bs-------------------+
+----------R---------+ |
+----------MVp---------)|(-MVp-+ | |
+------------>WV------------>+ +----Os---+ | +---------CV-------->+-----MVa----+
+->Wd--+-Ss-+--PPf--+---Pv---+---MVi--+--I-+ +Ds**c+ +-Js-+ +--Cr-+--Ss-+---PP---+--Ox-+ |
| | | | | | | | | | | | | | | | |
LEFT-WALL he had.v-d been.v allowed.v-d to.r eat.v a cake.s by Sophy.f that.j-r she had.v-d made.v-d him specially
It is still able to generate horizontal line crossings if no other choice because I didn't remove the code that does it (I don't have such examples for now).
The +----------MVp---------)|(-MVp-+
printout can be modified to be (not implemented)
+---------------MVp----)|(-----+
(one label in the center of the link line) but: 1. I'm not sure it is better; 2. A complex logic would be needed (but it is still straightforward). So I leave it as is for now.
(My previous post above has been edited -- as usual -- so it is better to read it on the web.) I got the following linkages, please check if they make sense:
linkparser> Onward went the cavalry, spurred to extraordinary exertion by the fact that provisions began to run short.
+----------------------------------------------------Xc---------------------------------------------------+
| +----------------------------Bsd----------------------------+ |
| +------------R-----------+ | |
+---------------MVp--------------)|(-MVp--+ | | |
+------>WV------>+---->SIs----+----MXsp----+ +-----------Ju-----------+ +---Jp---+ +----------CV-------->+-----IV---->+ |
+-->Wp-->+<-PFb<-+ +-Ds**c+ +--Xd--+---MVp--+ +-------A------+ | +D*u*c+ +----Cr----+---Sp*t---+---TO--+-I*t+--MVa-+ |
| | | | | | | | | | | | | | | | | | | |
LEFT-WALL onward went.v-d the cavalry.n , spurred.v-d to.r extraordinary.a exertion.n-u by the fact.n that.j-r provisions.n began.v-d to.r run.v short.e .
...
+----------------------------------------------------Xc---------------------------------------------------+
| +----------------------------Bsw----------------------------+ |
| +------------R-----------+ | |
+---------------MVp--------------)|(-MVp--+ | | |
+------>WV------>+---->SIs----+----MXsp----+ +-----------Ju-----------+ +---Jp---+ +----------CV-------->+-----IV---->+ |
+-->Wp-->+<-PFb<-+ +-Ds**c+ +--Xd--+---MVp--+ +-------A------+ | +D*u*c+ +----Cr----+---Sp*t---+---TO--+-I*t+--MVa-+ |
| | | | | | | | | | | | | | | | | | | |
LEFT-WALL onward went.v-d the cavalry.n , spurred.v-d to.r extraordinary.a exertion.n-u by the fact.n that.j-r provisions.n began.v-d to.r run.v short.e .
...
+----------------------------------------------------Xc---------------------------------------------------+
| +-----------------------------Mv----------------------------+ |
| +--------------------------Bs--------------------------+ | |
| +------------R-----------+ | | |
+---------------MVp--------------)|(-MVp--+ | | | |
+---->SIs----+----MXsp----+ +-----------Ju-----------+ +---Jp---+ +----------CV-------->+ | | |
+-->Wp-->+<-PFd<-+ +-Ds**c+ +--Xd--+---MVp--+ +-------A------+ | +D*u*c+ +----Cr----+---Sp*t---+--MVp--+ +--MVa-+ |
| | | | | | | | | | | | | | | | | | | |
LEFT-WALL onward went.v-d the cavalry.n , spurred.v-d to.r extraordinary.a exertion.n-u by the fact.n that.j-r provisions.n began.v-d to.r run.v short.e .
A proper fix for the others requires link-crossing. This is best illustrated by pondering the sentence: "Sophy wondered [up to] whose favorite number she should count to" and then realizing that [up to] needs to modify "number" not "whose". Unfortunately, this is not possible without link-crossing.
Which link should be allowed to cross which other link in that case? I would like to add these sentences as a tests to my WIP.
For the sentences:
I want to look at and listen to everything.
We currently get:
+-------------------------------Xp-------------------------------+
| +----------IV-------->+ |
| | +------I*t------+ |
+---->WV---->+ | +<-VJlpi<-+-----------Jk----------+ |
+->Wd--+-Sp*i+--TO-+ +-MVp+-Jj-+->VJrpi>+--MVp-+---Js--+ |
| | | | | | | | | | |
LEFT-WALL I.p want.v to.r look.v at and.j-v listen.v to.r everything .
In order to get the cross link, I added:
We also need to think on undesired effects og such cross links on postprocessing, since it would think that each of the fake-cross-link segments have a real link label.
The modified Js
to @js
turned out to be a bad solution (at least w/o further changes), since I now get the following parse:
linkparser> A picture of dogs are in the yard
Found 4 linkages (4 had no P.P. violations)
Linkage 1, cost vector = (UNUSED=0 DIS= 1.05 LEN=14)
+------------Js------------+
+---->Wa----+ | +----Js---+
| +Ds**c+--Mf--+ +-Spx-+--Pp-+ +Ds**c+
| | | | | | | | |
LEFT-WALL a picture.n of dogs.n are.v in.r the yard.n
Now this sentence parses, but yet with a bad diagram drawing (to be fixed):
+-------------------------------------------------------------------Xp-------------------------------------------------------------------+
| +-------------IV----------->+ +--Js-------------Js----------+ |
+------------------>WV----------------->+ | +--------I*t--------+----->VJrpi---->+ | |
+-------->Wd---------+-------Ss*s-------+ +-----CV---->+ | +<----VJlpi<---+ +<--VJlpi<-+ | |
| +-----G-----+ +---E---+---TH---+-Cet-+--Ss--+---TO--+ +--MVp--+ | +-MVp+ +->VJrpi>+--MVp-+---Js--+ |
| | | | | | | | | | | | | | | | | | |
LEFT-WALL Shel[!] Silverstein[!] once.e said.v-d that.j-c he wanted.v-d to.r go.v everywhere ,.j look.v at and.j-v listen.v to.r everything .
I'm slightly confused by the statements about <fxlink>
- presumably, this is something that gets added to individual words, on an as-needed basis, and is not something globally applied, right?
Things like @Js-
are difficult; there need to be additional constraints that only allow multiple Js if there are VJ
's in the sentences, and only if they're connecting. Forcing stuff like this quickly gets convoluted and tricky.
I got the following linkages Onward went the cavalry, spurred to extraordinary exertion by the fact that provisions began to run short.
They are all wrong; there should not be any connection between "exertion" and "that". This is explained in https://www.abisource.com/projects/link-grammar/dict/section-B.html -- so "the dog I had chased was black" -- in this case "I had chased" is modifying "dog" with a B link. But "provisions began to run short" is not modifying "exertions".
In the Sofie example, "she had made" is a B-modifier of "cake", so there, the B is correct.
I'm slightly confused by the statements about
<fxlink>
- presumably, this is something that gets added to individual words, on an as-needed basis, and is not something globally applied, right?
It is done according to your specification in https://github.com/opencog/link-grammar/issues/1050#issuecomment-557770152:
- Some way to write down "link X can cross link Y" in the dictionary.
- by analogy to
J
j andJk
, your hack to auto generateXj
andXk
and then auto-addXj- & Y & Xk+
(See the whole specification there.)
So the <fxlink>
definitions are globally applied to the whole dictionary.
I used:
<fxlink>: MVp+ & R+
;
To say that MVp
is allowed to cross evey R
. But this is unneeded permissive so I intended to try:
<fxlink>: MVp+ & R+ & Bs+
;
To say that MVp
is allowed to cross only both R+ & Bs+
at once.
(I'm still in the middle of writing a more complex connector sequence matcher to allow more flexible <fxlink>
syntax.)
Things like
@Js-
are difficult; there need to be additional constraints that only allow multiple Js if there areVJ
's in the sentences, and only if they're connecting.
If you can specify these additional constrains exactly than maybe I will be able to enforce them automatically.
They are all wrong; there should not be any connection between "exertion" and "that".
So it turns out that <fxlink>: MVp+ & R
is doing the right thing for
He had been allowed to eat a cake by Sophy that she had made him specially
but not for:
Onward went the cavalry, spurred to extraordinary exertion by the fact that provisions began to run short.
even though the exact same links are allowed to cross.
Hence my question is how to limit the cross link specification so it will not generate wrong cross links like that.
is doing the right thing for
Sophy
but not forexertion
.
Yes. These two sentences "obviously" differ in that, for the Sophy sentence, cake
has O- & R+ & B+
-- that is, a rule which says "direct objects can have relative modifiers". By contrast, exertion
has J- & R+ & B+
so maybe this should be disallowed?
Some experimentation shows that this nonsense sentence does get a parse: Onward went the cavalry, spurred to extraordinary exertion that provisions began to run short.
- again with the troublesome J- & R+ & B+
disjunct. But this one makes sense: Onward went the cavalry, eating so much that provisions began to run short.
and it uses the O- & R+ & B+
disjunct. Also this sentence makes sense, and has a good parse: Onward went the cavalry, showing such extraordinary gluttony that provisions began to run short.
I cannot think of any sentences that require J- & R+ & B+
so maybe removing this from the dict is the correct thing to do? That requires another experiment ... remove it, and see what fails (if anything). I'll try that experiment now.
In general, I don't really like the global context, because, in general, whether something is allowed or not depends on the local context. The above provides an example of local context: was the (R+ & B+)
in a disjunct with J-
or with O-
or with something else?
To answer your other question about VJ
: this requires enforcing long-range order, and long-range order can be enforced with additional subscripts. For example, one possible fix is to create J***v+
so that v
prevents connections to anything that doesn't have a J***v- & VJ+
-- do this by making all other J
have J***x
so that the x
blocks the connection. (There might be other solutions. Some of the post-processing rules do this kind of enforcement, but in a different way. I tend to not like post-processing)
This is a complicated example; there are simpler examples in the dict, the most obvious being the singular-plural distinction: so Js, Os, Ss, SIs
are never mixed into disjuncts that have Jp, Op, Sp, SIp
in them, thus forcing long-range agreement on singular-plural, across many different connector types, and thus across longer spans of links.
I'll try that experiment now.
Heh. Already noted as a problem: https://github.com/opencog/link-grammar/blob/f012bfb1ee4111ae784f832daa55e5097cbe378e/data/en/4.0.dict.m4#L132-L141 which suggests that a good fix won't be easy to find...
In general, I don't really like the global context, because, in general, whether something is allowed or not depends on the local context. The above provides an example of local context: was the
(R+ & B+)
in a disjunct withJ-
or withO-
or with something else?
I used the following rule:
<fxlink>: [MVp+]-1.65 & R+ & Bs+;
which means "insert MVp
before any R+ & Bs+
In principle I can extend this to something like:
<fxlink>: MVp+ & J- & R+ & Bs+;
when the second term (J-
) means "not containing this term in the opposite jet".
In my current implementation, adding this would be awkward, since I used "connector stream editing", meaning that I insert the needed connectors on the fly (In this case MVp-
) while reading connectors from the dict file (In addition I also use disjunct editing to add MVp+
as a shallow connector on the opposite jet, because this cannot be done, in general, buy dict editing).
I used "connector stream editing" because it makes the insertion once per macro, disregarding how many times the macros are used. Manipulating whole expressions need many times more matching/insertion operations.
So in order to implement more restricting rules I can change my implementation to disjunct editing only (a simple change, but with more overhead since it is done per sentence and not mostly on dict read).
BTW, using this rule (as the only rule) generates a lot of disjuncts that are not getting pruned.
This causes several percentage slowness on the basic
and fixes
batch benchmark, but ~30% on the failures
benchmark! So using several such rules would cause a significant slowness. Maybe adding a special option for "allow cross links" may be a solution for that.
I will also give a second look on my old pruning WIPs in which I encountered fundamental problems, since I think that by now I found solutions to these problems. I hope that a more aggressive pruning will alleviate the slowness caused by an increased dict complexity.
For now I will just submit the non-related improvements that I did in the code that I touched in this WIP.
In an unrelated search I encountered page 358 of "Intelligent Information and Database Systems: 8th Asian Conference ..., Part 2".
This conference was in 2016, but according to their benchmark time it seems they used the original CMU version (a common thing), but the problem is the same:
(They turned out using another parser.)
On the other hand,
in which countries?
does parse:Here
in.r
uses its disjunctWj- & JQ+ & J+
to attach towhich countries
, so as a test I tried adding the disjunctMVp- & JQ+ & J+
.It didn't work and the question is why. Fixing this as needed may also be interesting.