Empty output file - Githubissues

atayuksel commented 5 years ago

Hi, I tried to use pre-trained model as explained in the Readme. However, the output files are empty. I cloned the project to ~/ and downloaded cwlm_lstm_crf.json and cwlm_lstm_crf.model to the same directory. I'm using Python 3.6.7, Ubuntu 18.04 and PyTorch 1.1. I copied the input file that is described in Readme.txt as test.tsv

The
severe
anemia
(
hemoglobin
1
.
2
g
/
dl
)
appeared
to
be
the
primary
etiologic
factor
.

Running command is python3 seq_wc.py --load_arg cwlm_lstm_crf.json --load_check_point cwlm_lstm_crf.model --input_file /home/atakan/Multi-BioNER-master/test.tsv --output_file output --gpu 0 The output on terminal is

loading dictionary
loading model
/home/atakan/.local/lib/python3.6/site-packages/torch/nn/modules/rnn.py:54: UserWarning: dropout option adds dropout after all but last recurrent layer, so non-zero dropout expects num_layers greater than 1, but got dropout=0.5 and num_layers=1
  "num_layers={}".format(dropout, num_layers))
loading corpus
annotating the entity type 0
annotating the entity type 1
annotating the entity type 2
annotating the entity type 3
annotating the entity type 4

The output files are empty. What could be the problem?

yuzhimanhua commented 5 years ago

Hi,

Thank you for pointing out this bug! I find that seq_wc.py always misses the LAST sentence in the testing file. So I guess when you only put one sentence in your testing file, there will be nothing in the results...

I will try to fix it ASAP. If you would like a simple and quick fix, please just "pad" one random sentence at the end of your testing file.

Thanks, Yu

yuzhimanhua commented 5 years ago

Hi,

I have fixed the bug. You do not need to pad an extra sentence now if you clone the newest version.

Thanks, Yu

atayuksel commented 5 years ago

Hi,

Thanks for the quick response. I downloaded the latest version and run the pre-trained model. I faced with another error.

Traceback (most recent call last):                                                                                                                                        
  File "seq_wc.py", line 86, in <module>
    predictor.output_batch(ner_model, feature, fout, idx)
  File "/home/atakan/Multi-BioNER-master/model/predictor.py", line 142, in output_batch
    fout.write(self.decode_str(features[ind2], l) + '\n')
  File "/home/atakan/Multi-BioNER-master/model/predictor.py", line 46, in decode_l
    return '\n'.join(map(lambda t: t[0] + ' '+ self.r_l_map[t[1]], zip(feature, label)))
  File "/home/atakan/Multi-BioNER-master/model/predictor.py", line 46, in <lambda>
    return '\n'.join(map(lambda t: t[0] + ' '+ self.r_l_map[t[1]], zip(feature, label)))
KeyError: tensor(0)

I saw this error in the previous issue and corrected the error with the following change that is described in the previous issue.

def decode_l(self, feature, label):
       return '\n'.join(map(lambda t: t[0] + ' '+ self.r_l_map[t[1].item()], zip(feature, label)))

Now, it works perfectly. The error could be because of PyTorch version. I use PyTorch 1.1, maybe an if condition that checks the version could be helpful I guess.

By the way, thanks for the help again :) Atakan

yuzhimanhua / Multi-BioNER

Empty output file #5