pschwllr / MolecularTransformer

Other
342 stars 71 forks source link

"Length of values does not match length of index" for beam_size >1 #5

Closed Sum02dean closed 4 years ago

Sum02dean commented 4 years ago

Hi, I am running score_predictions.py however the operation exits due to the below error.


mol_transformer/bin/python score_predictions.py -targets data/raw/tgt-test.txt -predictions experiments/results/raw_results/predictions_raw_model_step_129000_on_raw_test.txt
Traceback (most recent call last):
  File "score_predictions.py", line 73, in <module>
    main(opt)
  File "score_predictions.py", line 38, in main
    test_df['prediction_{}'.format(i + 1)] = preds
  File "/home/user/miniconda3/envs/mol_transformer/lib/python3.6/site-packages/pandas/core/frame.py", line 2938, in __setitem__
    self._set_item(key, value)
  File "/home/user/miniconda3/envs/mol_transformer/lib/python3.6/site-packages/pandas/core/frame.py", line 3000, in _set_item
    value = self._sanitize_column(key, value)
  File "/home/user/miniconda3/envs/mol_transformer/lib/python3.6/site-packages/pandas/core/frame.py", line 3636, in _sanitize_column
    value = sanitize_index(value, self.index, copy=False)
  File "/home/user/miniconda3/envs/mol_transformer/lib/python3.6/site-packages/pandas/core/internals/construction.py", line 611, in sanitize_index
    raise ValueError("Length of values does not match length of index")

I have checked the src-test.txt, tgt-test.txt, and predictions.txt files and they all contains the same number of observations. The scripts runs fine if I pass -beam_size 1 but fails when I use any other integer.

Sum02dean commented 4 years ago

I think I got it! :)

Sum02dean commented 4 years ago

Reopened. I put -beam_size 10 in both translate.py call and score_predictions.py call, thinking this would work however I still get the issue.

pschwllr commented 4 years ago

You'll find all the options in: https://github.com/pschwllr/MolecularTransformer/blob/master/onmt/opts.py

Or when you do: python translate.py --help

If you put -beam_size 10, you still have to change -n_best to how many outputs you want per prediction. For example, -n_best 3 would lead to an output like:

reaction1 top1
reaction1 top2
reaction1 top3
reaction2 top1
reaction2 top2
reaction2 top3

In score_predictions.py you then have to change to -beam_size 3, so that it considers always 3 prediction lines per ground-truth reaction. I should probably have called it -n_best in the score_predictions.py.

It might be a bit confusing at first, but the -beam_size parameter in the scoring script should match the number of outputs per ground-truth reaction.

Sum02dean commented 4 years ago

Hi Philippe,

Thank you for the clear and concise explanation, the script works fine now.

P.S I also removed '-fast' from the translate args since it seemed to conflict with _nbest >1.

Best, Dean