ufal / treex

Treex NLP framework
33 stars 6 forks source link

Wild attributes set to anodes are not retained after parsing in the NL analysis #12

Closed michnov closed 9 years ago

michnov commented 9 years ago
echo "Wat is een Web cache?" | \
treex -Lnl \
Read::Sentences \
W2A::NL::Tokenize \
Util::Eval anode='$.wild->{test} = "ok";' \
Util::Eval anode='print($.wild->{test}); print "\n";' \
A2P::NL::ParseAlpino \
Util::Eval zone='$.remove_tree("a");' \
P2A::NL::Alpino \
Util::Eval anode='print($.wild->{test});'

returns

TREEX-INFO:     3.754:  Document 1/1 noname loaded from -
TREEX-INFO:     3.754:  Applying block 1/8 Treex::Block::Util::SetGlobal
TREEX-INFO:     3.754:  Applying block 2/8 Treex::Block::W2A::NL::Tokenize
TREEX-INFO:     3.767:  Applying block 3/8 Treex::Block::Util::Eval
TREEX-INFO:     3.768:  Applying block 4/8 Treex::Block::Util::Eval
ok
ok
ok
ok
ok
ok
TREEX-INFO:     3.769:  Applying block 5/8 Treex::Block::A2P::NL::ParseAlpino
TREEX-INFO:     5.266:  ALPINO: hdrug: process 26327 on host sol6 (datime(2015,8,25,16,0,53))
TREEX-INFO:     6.304:  Applying block 6/8 Treex::Block::Util::Eval
TREEX-INFO:     6.307:  Applying block 7/8 Treex::Block::P2A::NL::Alpino
TREEX-INFO:     6.384:  Applying block 8/8 Treex::Block::Util::Eval
Use of uninitialized value in print at (eval 919) line 1, <GEN14> line 25.
Use of uninitialized value in print at (eval 920) line 1, <GEN14> line 25.
Use of uninitialized value in print at (eval 921) line 1, <GEN14> line 25.
Use of uninitialized value in print at (eval 922) line 1, <GEN14> line 25.
Use of uninitialized value in print at (eval 923) line 1, <GEN14> line 25.
Use of uninitialized value in print at (eval 924) line 1, <GEN14> line 25.
TREEX-INFO:     6.386:  Applying process_end
TREEX-INFO:     6.387:  Processed 1 document
TREEX-INFO:     6.387:  Processed 1 document
TREEX-INFO:     6.387:  Running the scenario took 3 seconds
tuetschek commented 9 years ago

Yes, they're not because these are basically completely different nodes and it's not even guaranteed that there will be the same amount of them as before the parsing. During Alpino parsing, the whole tree is basically re-built again from the Alpino output.

This will not be an easy fix. Is this crucial in any way? Can't you first parse and then set the wild attributes?

tuetschek commented 9 years ago

PS: Actually I forgot that it's even more complicated – the Alpino parse is loaded into a p-tree, which is then converted into an a-tree (that replaces the previous, flat a-tree).

martinpopel commented 9 years ago

Treex::Tool::PhraseParser::Alpino is stored in file https://github.com/ufal/treex/blob/08549e4210c0432b32f7375ff9e13a0974168bc3/lib/Treex/Tool/Alpino/Parser.pm#L1 Is it on purpose?

I am not sure if the interface of this module is optimal ($parser->parse_zones($zones_rf)), but let's say it os OK (or legacy).

Maybe it could try to copy any wild attributes from the a-nodes to the newly created p-nodes on best-effort basis (if there is not 1-to-1 correspondence between the a-nodes and p-node terminals).

tuetschek commented 9 years ago

Oops – the package name is rather a mistake.... it should be Treex::Tool::Alpino::Parser. I will fix this (if you don't have a different view).

The interface of this module is adapted from some other parser, I do not remember which one anymore. What would be the optimal way?

It definitely could try to copy the wild attributes – this is the "non-easy" fix ;-).

michnov commented 9 years ago

It's needed for gazetteers, particularly for W2A::GazeteerMatch, which in its current implementation writes into wild attributes and must be run just after tokenization, definitely before parsing (https://github.com/ufal/treex/blob/gazeteer/lib/Treex/Scen/Analysis/NL.pm#L31)

michnov commented 9 years ago

Copying wild attributes from p-tree to a newly established a-tree is easy and already sorted (https://github.com/ufal/treex/blob/master/lib/Treex/Block/P2A/NL/Alpino.pm#L205). The problem is, how to copy the attributes from the old a-tree to a p-tree created by Alpino (https://github.com/ufal/treex/blob/master/lib/Treex/Tool/Alpino/Parser.pm)