valeriobasile / learningbyreading

Learning by Reading pipeline of NLP and Entity Linking tools
GNU General Public License v2.0
82 stars 24 forks source link

changelog and version #12

Closed prpfialho closed 7 years ago

prpfialho commented 7 years ago

Hi,

Do you have the change log for C&C/Boxer version 2614? When I test it for version, I get:

./bin/candc --version candc v1.00

(probably because the .svn folder which keeps versioning is missing)

Sincerely, Pedro

valeriobasile commented 7 years ago

Hi Pedro, I'm afraid I don't have that. I took a snapshot of the C&C tools SVN and copied the content in this repository, not long before their server went down. This should be the last version that was available, or one of the latest revisions. Is there a particular piece of information you are looking for?

prpfialho commented 7 years ago

It was to know the features of the version I'm using. I have the changelog until 2611 (here attached). candctrunk.txt

But I'd like to know how the pipeline differs between a DRS (box) and a DRG (triples), and how AMR are generated.

(all details on the various values for the "--semantics" option are welcome too)

valeriobasile commented 7 years ago

I can answer for the DRG part, since I was involved in that extension. For the rest, you should contact the developer of Boxer, this guy: http://www.rug.nl/staff/johan.bos/

Boxer produces lambda-expressions from syntactic constituents that come from the CCG tree output by the C&C parser. When producing a DRG, there are two additional steps with respect to the standard DRS: 1) discourse units are reified, that is, an id is generated and attached to the discourse units consequently the predicates in there; 2) Boxer keeps track of what span of text has generated which predicate, to prodice the alignment with the surface form. So the DRG is a direct translation of a DRS, it is more verbose (to allow for the alignment with the text) but they contain the same information.