yzhangcs / parser

:rocket: State-of-the-art parsers for natural language.
https://parser.yzhang.site/
MIT License
829 stars 141 forks source link

Attribute error when calling hasattr() #23

Closed JingyaXun closed 4 years ago

JingyaXun commented 4 years ago

I ran into the following error when initiating training:

To use data.metrics please install scikit-learn. See https://scikit-learn.org/stable/index.html
Set the max num of threads to 16
Set the seed for generating random numbers to 1
Set the device with ID 0 visible
Override the default configs with parsed arguments
----------------+--------------------------
Param           |           Value          
----------------+--------------------------
bert_model      |      bert-base-cased     
n_embed         |            100           
n_char_embed    |            50            
n_bert_layers   |             4            
embed_dropout   |           0.33           
n_lstm_hidden   |            400           
n_lstm_layers   |             3            
lstm_dropout    |           0.33           
n_mlp_arc       |            500           
n_mlp_rel       |            100           
mlp_dropout     |           0.33           
lr              |           0.002          
mu              |            0.9           
nu              |            0.9           
epsilon         |           1e-12          
clip            |            5.0           
decay           |           0.75           
decay_steps     |           5000           
batch_size      |           5000           
epochs          |            500           
patience        |            100           
min_freq        |             2            
fix_len         |            20            
mode            |           train          
buckets         |            32            
punct           |           False          
ftrain          | /home/jingya/biaffine-parser/data/test_en/train.conllu
fdev            | /home/jingya/biaffine-parser/data/test_en/dev.conllu
ftest           | /home/jingya/biaffine-parser/data/test_en/test.conllu
fembed          | /home/jingya/biaffine-parser/data/glove/glove.6B.100d.txt
unk             |            unk           
conf            |        config.ini        
file            |       exp/ptb.char       
preprocess      |           True           
device          |           cuda           
seed            |             1            
threads         |            16            
tree            |           False          
feat            |           char           
fields          |    exp/ptb.char/fields   
model           |    exp/ptb.char/model    
----------------+--------------------------

Run the subcommand in mode train
Preprocess the data
Traceback (most recent call last):
  File "run.py", line 58, in <module>
    cmd(args)
  File "/home/jingya/biaffine-parser/parser/cmds/train.py", line 40, in __call__
    super(Train, self).__call__(args)
  File "/home/jingya/biaffine-parser/parser/cmds/cmd.py", line 49, in __call__
    self.WORD.build(train, args.min_freq, embed)
  File "/home/jingya/biaffine-parser/parser/utils/field.py", line 73, in build
    counter = Counter(token for sequence in sequences
  File "/home/weijian/anaconda3/envs/biaffine/lib/python3.7/collections/__init__.py", line 566, in __init__
    self.update(*args, **kwds)
  File "/home/weijian/anaconda3/envs/biaffine/lib/python3.7/collections/__init__.py", line 653, in update
    _count_elements(self, iterable)
  File "/home/jingya/biaffine-parser/parser/utils/field.py", line 73, in <genexpr>
    counter = Counter(token for sequence in sequences
  File "/home/jingya/biaffine-parser/parser/utils/corpus.py", line 59, in __getattr__
    raise AttributeError
AttributeError

After some investigation, it seems like the error is cased by getattr() in corpus.py

def __getattr__(self, name):
        if not hasattr(self.sentences[0], name):
            raise AttributeError

When I print out name I get "words", which is set in line 25 in cmd.py. However, self.sentences is a list of Sentence object, and by definition Sentence object has no attribute "words", and that is the cause of the AttributeError.

Any idea how I can fix this?

yzhangcs commented 4 years ago

Did you correctly create all the fields, e.g., WORD, FEAT. As a preprocessing step, you must execute the block here.

JingyaXun commented 4 years ago

Did you correctly create all the fields, e.g., WORD, FEAT. As a preprocessing step, you must execute the block here.

I presume all fields are created correctly as I did not make any changes to the cloned codes. I made sure that "preprocess" is set to True when I ran the script.

yzhangcs commented 4 years ago

Now the question might be the data format. Can I have a look at some sample instances? Indeed the Sentence object does not have the attr words. But it will be registered in Corpus.load. You can have a check.

JingyaXun commented 4 years ago

Now the question might be the data format. Can I have a look at some sample instances? Indeed the Sentence object does not have the attr words. But it will be registered in Corpus.load. You can have a check.

The data I used is Universal Denpendencies 2.5. It is downloaded from here.

Here is a snapshot of the training data:

head -n 50 train.conllu 
# newdoc docid = 1
# sent_id = en_lines-ud-train-doc1-1
# text = Show All
1   Show    show    VERB    IMP Mood=Imp|VerbForm=Fin   0   root    _   _
2   All all PRON    TOT-PL  Case=Nom    1   obj _   _

# sent_id = en_lines-ud-train-doc1-2
# text = About ANSI SQL query mode
1   About   about   ADP _   _   5   case    _   _
2   ANSI    ANSI    PROPN   SG-NOM  Number=Sing 5   compound    _   _
3   SQL SQL PROPN   SG-NOM  Number=Sing 2   flat    _   _
4   query   query   NOUN    SG-NOM  Number=Sing 5   compound    _   _
5   mode    mode    NOUN    _   Number=Sing 0   root    _   _

# sent_id = en_lines-ud-train-doc1-3
# text = Some of the content in this topic may not be applicable to some languages.
1   Some    some    PRON    IND Case=Nom    11  nsubj   _   _
2   of  of  ADP _   _   4   case    _   _
3   the the DET DEF Definite=Def|PronType=Art   4   det _   _
4   content content NOUN    SG-NOM  Number=Sing 1   nmod    _   _
5   in  in  ADP _   _   7   case    _   _
6   this    this    DET DEM-SG  Number=Sing|PronType=Dem    7   det _   _
7   topic   topic   NOUN    SG-NOM  Number=Sing 4   nmod    _   _
8   may may AUX PRES-AUX    VerbForm=Fin    11  aux _   _
9   not not PART    NEG _   11  advmod  _   _
10  be  be  AUX INF VerbForm=Inf    11  cop _   _
11  applicable  applicable  ADJ POS Degree=Pos  0   root    _   _
12  to  to  ADP _   _   14  case    _   _
yzhangcs commented 4 years ago

Remove the comment lines. Some lines starting with irregular indices like [0-9]-[0-9] should also be handled. Furthermore, in French and Russian (and perhaps other langs as well) treebanks, the spaces in words (like 100 000) are not cleared, which is also not compatible with my code.

JingyaXun commented 4 years ago

Removing the comment lines does solve my issue. Thank you so much for your help.

yzhangcs commented 4 years ago

:)