ratishsp / mlb-ie

Information extraction scripts for the MLB dataset
3 stars 0 forks source link

Adding example inputs and expected outputs #3

Open danieldeutsch opened 2 years ago

danieldeutsch commented 2 years ago

Hi Ratish,

Sebastian Gehrmann mentioned supporting your dataset-specific evaluation the GEM metrics library. To do that, I will convert this codebase to a Docker container, add it to my Repro library, and then add it to the GEM library.

Could you provide some example inputs and expected outputs for this code? It will make it much easier for me to make sure that I've faithfully Dockerized your code.

Thanks!

danieldeutsch commented 2 years ago

I also don't know much about Lua, so any other specific information about the runtime will be very helpful (i.e., what versions of Lua, Python, specific python libraries, etc.).

ratishsp commented 2 years ago

Hi Daniel, thanks for helping! The Torch version is 7. I had followed the steps in http://torch.ch/docs/getting-started.html to install lua torch http://torch.ch/docs/getting-started.html.

ratishsp commented 2 years ago

The list of instructions for evaluation is at https://github.com/ratishsp/data2text-macro-plan-py/blob/main/README_MLB.md#evaluation

ratishsp commented 2 years ago

The input to Step 2 python add_segment_marker.py -input_file $GEN/$IDENTIFIER-beam5_gens.txt -output_file \ $GEN/$IDENTIFIER-segment-beam5_gens.txt is test_gold.txt

ratishsp commented 2 years ago

For the command python mlb_data_utils.py -mode prep_gen_data -gen_fi $GEN/$IDENTIFIER-segment-beam5_gens.txt \ -dict_pfx "$IE_ROOT/data/mlb-ie" -output_fi $DOC_GEN/transform_gen/$IDENTIFIER-beam5_gens.h5 \ -input_path "$IE_ROOT/json" \ -ordinal_inning_map_file $GEN/$IDENTIFIER-inning-map-beam5_gens.txt \ -test

The dict_pfx files and input_path json are at https://drive.google.com/drive/folders/1q9xpjIBkF7YOerXE6eSiSDq158kjw8Nn

Note this python command requires Python 2.7

ratishsp commented 2 years ago

The output of the command th extractor.lua -gpuid 0 -datafile $IE_ROOT/data/mlb-ie.h5 \ -preddata $DOC_GEN/transform_gen/$IDENTIFIER-beam5_gens.h5 -dict_pfx \ "$IE_ROOT/data/mlb-ie" -just_eval -ignore_idx 14 -test is https://github.com/ratishsp/mlb-ie/blob/master/test_mlb-beam5_gens.h5-tuples.txt

ratishsp commented 2 years ago

Hope it helps!