taku910 / crfpp

CRF++: Yet Another CRF toolkit
Other
505 stars 192 forks source link

What features are generated for out-of-sentence positions? #27

Closed garfieldnate closed 7 years ago

garfieldnate commented 8 years ago

The documentation has the following example:

Input: Data

He        PRP  B-NP
reckons   VBZ  B-VP
the       DT   B-NP << CURRENT TOKEN
current   JJ   I-NP 
account   NN   I-NP
template expanded feature
%x[0,0] the
%x[0,1] DT
%x[-1,0] reckons
%x[-2,1] PRP
%x[0,0]/%x[0,1] the/DT
ABC%x[0,1]123 ABCDT123

The documentation does not state what features are generated for out-of-bounds positions. For example, what features are generated in the following situations?

He        PRP  B-NP << CURRENT TOKEN
reckons   VBZ  B-VP
the       DT   B-NP
current   JJ   I-NP 
account   NN   I-NP
template expanded feature
%x[-1,0] ???
He        PRP  B-NP
reckons   VBZ  B-VP
the       DT   B-NP
current   JJ   I-NP 
account   NN   I-NP << CURRENT TOKEN
template expanded feature
%x[1,0] ???

Or are these features perhaps not generated at all? I need to know if I should be adding my own BOS/EOS tokens.

versusvoid commented 7 years ago

To whom it may concern: Nope. BOS/EOS are added inside library. See FeatureIndex::getIndex@feature.cpp

garfieldnate commented 7 years ago

Thanks!