stickeritis / sticker

Succeeded by SyntaxDot: https://github.com/tensordot/syntaxdot
Other
25 stars 2 forks source link

Improve error messages of the dependency encoders #173

Closed danieldk closed 4 years ago

danieldk commented 4 years ago

The error messages now include the sentence for which processing failed, showing the token in brackets.

DiveFish commented 4 years ago

When removing the POS tag or the head of a conll token, the error message I get in training is Cannot read batch: cannot parse as integer field: DET where DET is the dependency relation of the token. Am I missing something in my test case?

danieldk commented 4 years ago

When removing the POS tag or the head of a conll token, the error message I get in training is Cannot read batch: cannot parse as integer field: DET where DET is the dependency relation of the token. Am I missing something in my test case?

Did you replace the POS tag by an underscore (_). This error suggests that the fields have shifted and that the dependency relation is now in the head index column.

DiveFish commented 4 years ago

This error suggests that the fields have shifted and that the dependency relation is now in the head index column.

Yes, that is exactly what happens.

Did you replace the POS tag by an underscore (_).

I tried that as well - replace either the POS or the head by an underscore - but then the program simply finishes without having processed the sentence. So there is not even an error message indicating wrong input.

danieldk commented 4 years ago

On Fri, Nov 15, 2019, at 08:21, Patricia Fischer wrote:

This error suggests that the fields have shifted and that the dependency relation is now in the head index column.

Yes, that is exactly what happens.

Did you replace the POS tag by an underscore (_).

I tried that as well - replace either the POS or the head by an underscore - but then the program simply finishes without having processed the sentence. So there is not even an error message indicating wrong input.

Could you send that file (or perhaps just the file with that sentence)?

Not replacing by an underscore is definitely wrong, since it shifts the columns.

danieldk commented 4 years ago

Could you send that file (or perhaps just the file with that sentence)? Not replacing by an underscore is definitely wrong, since it shifts the columns.

Thanks for the example! Note that you removed coarse-grained POS. Coarse-grained tags are not used at all. In this case you can also remove the fine-grained tag, since Diese is not a head of any token. Try removing the tag of *lösen' and you would get:

Cannot collect sentence: Head of token 'Probleme' does not have a part-of-speech:

Diese [ Probleme ] lösen Studenten schnell .
DiveFish commented 4 years ago

Still produces the plain error message Error tagging sentences: Token without a tag: lösen (tagging) or Cannot read batch: Token without a tag: lösen (training) even after checking I am on the right branch

danieldk commented 4 years ago

Still produces the plain error message Error tagging sentences: Token without a tag: lösen (tagging) or Cannot read batch: Token without a tag: lösen (training) even after checking I am on the right branch

Do you use tag embeddings in your configuration? If so, then this is a different error. Since if you use tag embeddings, every token should have a tag.