syllog1sm / ccg

A Combinatory Categorial Grammar library.
22 stars 1 forks source link

Using this library #1

Open silentrob opened 9 years ago

silentrob commented 9 years ago

I'm trying to get my head around what this project actually does. Im mostly interested in how you parsed the C&C Tools Marked file and adding a grammar, and parsing logical forms.

syllog1sm commented 9 years ago

Here's one way to understand what this library does.

You can create CCG derivations using a PCFG, if you model the CCG categories as arbitrary strings (eg Fowler and Penn use the Berkeley parser). Like, you can model s\np --> s np as a normal production rule.

This library interprets allows you to load a Markedup file, which associates the category strings with annotated categories, so you can get the right dependencies

But, to actually use the logical forms, you have to pair the lexical categories with a semantic representation. This library assumes a dependency-based semantics, and uses the formats compatible with the c&c parser and ccgbank. It implements all the ccg rules, and allows you to get back all the right dependencies.

The markedup file is read by ccg.lexicon.load, which loads it as global state. Not how i'd design it now, but that's how it works...

If you tell me your specific task, i can probably help you do it. I wrote this during my research with James Curran, which was about making ccgbank have better semantics, and getting the parser to use the revised data.

silentrob commented 9 years ago

Thanks for the quick reply. Im slowing getting my head around it.

So, for my use case. I want to delve into converting sentences into logical forms (semantic reasoning) using Simply Typed Lambda Calculus.

I'm also wanting to do as much as I can using Node.js, because JavaScript is fun :)

So my thought was to create a node addon (c extension) the C&C super parser (using the pre-baked model) to generate a CCG tagged sentences, and then essentially use the basis of your work here for parsing production rules. I will need to bring in another map for the actual Lambda expressions to be folded in with the rules.

This approach is similar to the work being done out of UW with SPF http://yoavartzi.com//pub/afz-tutorial.acl.2013.pdf

syllog1sm commented 9 years ago

Right, I see.

So, first I shouldvadmit that i never gained fluency in lambda calculus as a semantics for CCG, even though this is the form Mark uses in his book, and what many of the papers use. Like, i can't immediately work through a derivation tor an arbitrary sentence with lambda semantics.

The library uses a dependency semantics, which is what the main wide-coverage implementations use (ccgbank, c&c, the openccg-based generation system mike white's group make).

So, youre going to have to transform the dependency semantics into lambda terms to do what you want. This will be easy for many cases, but ccgbank's grammar has lots of hacky stuff jnherited from the Penn Treebank.

To give you a sort of difficulty estimate, i'd expect this to take a very good PhD student most of a year.

I suggest you use the library as a reference, and mostly plan to reimplement. There's a lot that's messy in the library, and it doesnt do exactly what you need. So you'll probably want to own the code.

On Thursday, May 7, 2015, Rob Ellis notifications@github.com wrote:

Thanks for the quick reply. Im slowing getting my head around it.

So, for my use case. I want to delve into converting sentences into logical forms (semantic reasoning) using Simply Typed Lambda Calculus.

I'm also wanting to do as much as I can using Node.js, because JavaScript is fun :)

So my thought was to create a node addon (c extension) the C&C super parser (using the pre-baked model) to generate a CCG tagged sentences, and then essentially use the basis of your work here for parsing production rules. I will need to bring in another map for the actual Lambda expressions to be folded in with the rules.

This approach is similar to the work being done out of UW with SPF http://yoavartzi.com//pub/afz-tutorial.acl.2013.pdf

— Reply to this email directly or view it on GitHub https://github.com/syllog1sm/ccg/issues/1#issuecomment-99996699.

silentrob commented 9 years ago

Thanks for the feedback and difficulty estimate.

I will need to re-implement the parser anyway because of the license. I'm just looking for a few shortcuts up front while I extend my understanding of how the parser and tagger work.

Perhaps I should track down some PHD students to help out, or at least review the process. I might be in over my head.

syllog1sm commented 9 years ago

Is this a commercial project, or something you're doing for kicks?

In the long-term, I'd like to switch my commercial parser, spaCy, to CCG. I have a lot of other priorities first, though.

On Friday, May 8, 2015, Rob Ellis notifications@github.com wrote:

Thanks for the feedback and difficulty estimate.

I will need to re-implement the parser anyway because of the license. I'm just looking for a few shortcuts up front while I extend my understanding of how the parser and tagger work.

Perhaps I should track down some PHD students to help out, or at least review the process. I might be in over my head.

— Reply to this email directly or view it on GitHub https://github.com/syllog1sm/ccg/issues/1#issuecomment-100083482.

silentrob commented 9 years ago

The project is not commercial right now, but It could get wound into a startup at some point so the license just needs to be a little more liberal. (MIT, ICS, BSD, Apache2, LGPL3).

For today, this is work is just for kicks and dev-experimental, but I think the approach is sound and could lead to something greater in the NLU space.

syllog1sm commented 9 years ago

On Fri, May 8, 2015 at 6:00 PM, Rob Ellis notifications@github.com wrote:

The project is not commercial right now, but It could get wound into a startup at some point so the license just needs to be a little more liberal. (MIT, ICS, BSD, Apache2, LGPL3).

For today, this is work is just for kicks and dev-experimental, but I think the approach is sound and could lead to something greater.

Cool.

I guess my main tip would be to check out some of the most recent CCG parsing work, particularly by Mike Lewis, as well as the CCG module of ZPar.

These end up using the CCG implementation in C&C, but you could use the implementation in this library instead.

— Reply to this email directly or view it on GitHub https://github.com/syllog1sm/ccg/issues/1#issuecomment-100280608.

silentrob commented 9 years ago

Thanks for all your feedback. I just realized you are behind spaCy! I have taken a look at that a few months back and was super impressed. I help put together https://github.com/NaturalNode/natural a few years back and more recently Ive been working on superscriptjs.com, a dialogue engine for chatbots. I'm now looking to bridge the gap and add NLU/NLG, but starting on the NLU side first.

62mkv commented 3 years ago

Hi guys. I came across some Lambek calculus PDF recently, and fell into rabbit hole so here I am, looking for a way to get started with "how to define a grammar for a particular language and execute it against some set of sentences". But I could not find any "how to get started" or anything like that. so questions I would ask, are:

to add more context: I have a database with all (annotated) lexemes and forms for Estonian language, kinda what you can see in the Wikidata Lexemes space, but applying those to an actual texts would lead to ambiguities, as certains different forms and even forms of different lexemes could have same representation, so I am looking for a tool that I would be able to feed a grammar, a set of possible forms, and see if that constitutes a valid sentence as defined by that grammar, or not.

62mkv commented 3 years ago

this (https://github.com/OpenCCG/openccg) looks like something more or less befitting my needs.. will give that a try! thanks