ratishsp / mlb-ie

Information extraction scripts for the MLB dataset
3 stars 0 forks source link

extractor.lua gives me an error. #1

Closed sanghyuk-choi closed 3 years ago

sanghyuk-choi commented 3 years ago

I've downloaded mlb-convie3-ep9-91-75 and mlb-blstmie2-ep10-92-75 from google drive. but when I run extractor.lua, got below error. I think it's about hyperparameter issue.

torch/install/bin/luajit: extractor.lua:564: bad argument #1 to 'copy' (sizes do not match at .../torch/extra/cutorch/lib/THC/THCTensorCopy.cu:31) stack traceback: [C]: in function 'copy' extractor.lua:564: in function 'main' extractor.lua:651: in main chunk [C]: in function 'dofile' .../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50

ratishsp commented 3 years ago

Hi, I have shared the processed "mlb-ie.h5, mlb-ie.dict, and mlb-ie.labels" at https://drive.google.com/drive/folders/1n8hljpAyYNQV7Ut5fSSCfV1rasofNxYd?usp=sharing. Hope it helps.

sanghyuk-choi commented 3 years ago

Thank you very much. It worked. I followed the instructions, and got exactly same size dataset(a split of 22,821/1,739/1,744 instances) but I don't know why "mlb-ie.xxx" is different from yours.

and can I get the gold results using this model just to clear that mine is same with yours? e.g) Gold | RG% 96.11 / #17.31 on ROTOWIRE data set mine is RG% 83.78 on MLB data set (gold) I think something is wrong...


my test data set(1,744 instances) starts from WASHINGTON ( AP ) - - Ryan Zimmerman raised a fist when he rounded first , ... and md5sum is 8aea61fac0d5141c43d15c6df60b5cf0

ratishsp commented 3 years ago

Hi, My test data set too starts with the same line. However the md5sum is different. It could be because some webpage may have underwent some modification after I created the dataset. However, the changes should be minor. I remember getting upwards of 90% accuracy on MLB gold test dataset. One way to debug the issue would be to compare your generated test_mlb-beam5_gens.h5-tuples.txt with https://github.com/ratishsp/mlb-ie/blob/master/test_mlb-beam5_gens.h5-tuples.txt. Ideally the difference should be minimal, if any.

sanghyuk-choi commented 3 years ago

Hi, I've tried creating dataset again by any chance that I made a mistake. but It gave me a same testset file. While creating, I got two questions. (actually it's more relative with mlb-data-scripts not this repo)

  1. html-output-cleaned/*20080611_Diamondbacks-Mets_3-5 is not on the list. so executing creating_combined_dataset.py prints html-output-cleaned/*20080611_Diamondbacks-Mets_3-5 not found I did twice one for including that file, the other for excluding it, and It seems to affect on training set but not on the test set.

  2. while executing mlb_data_utils.py, I got some messages such as u'three' may not proceed u'two' magnitude u'hundred' must be preceded by a number is there anything wrong from it? notice that I'm using text2num.py from https://github.com/harvardnlp/data2text

https://www.diffchecker.com/IhXDOQ0X shows the difference of test_mlb-beam5_gens.h5-tuples.txt

This is all I did and I think 83.78% for the GOLD test dataset leaves something to be desired for the exact evaluation. so if possible, could you share the dataset of yours? or help me to get the right evaluating model?

I'm really appreciate for your kindness. thank you.

ratishsp commented 3 years ago

Hi, I think the lower RG precision for Gold dataset may be because of the ie_json/test.json file that is input. The steps to create ie_json are mentioned in https://github.com/ratishsp/mlb-ie#creating-ie-json

python ie_json_creation.py -input_folder "../data/mlb/json" -output_folder "../data/mlb/ie_json" -type [train|valid|test] -scoring

For evaluating the model outputs, you can retain the flag scoring. However, when evaluating the gold test dataset, you will need to create the ie_json/test.json by dropping the flag scoring. Lets call such a json file as ie_json/test_all_pbyp.json

You can then run the steps in https://github.com/ratishsp/mlb-ie#evaluating-generated-summaries for the gold test dataset. With this, the RG precision of the gold test dataset should be higher.

Hope it helps.

sanghyuk-choi commented 3 years ago

Hi, I did it again dropping the flag scoring and It shows 92.1% RG precision on GOLD test dataset which is not as good as ROTOWIRE testset, but It's maybe because of the modifications on MLB dataset after you create the dataset as you mentioned. It has a limitation to compare the result between my model and other's model but I think it's fairly fine for comparing among mines. if it's not possible to use same dataset due to copyright issue or etc., I think It's the best of what I can do. Thank you for the kindness and If you don't have any further feedback, I'll close the issue.

ratishsp commented 3 years ago

Hi Sanghyuk, glad that the RG precision improved! Indeed, the precision is lower than RotoWire. However as mentioned in the paper too, IE for MLB dataset is relatively more difficult. Regarding the changes in the webpages, one way to fix it could be to use wayback archive urls similar to that used in https://github.com/EdinburghNLP/XSum/tree/master/XSum-Dataset. Please feel free to make a PR which does that. You can also email me to discuss details.

ratishsp commented 3 years ago

HI Sanghyuk, I have shared the datataset at https://drive.google.com/drive/folders/1G4iIE-02icAU2-5skvLlTEPWDQQj1ss4?usp=sharing and also linked the same from https://github.com/ratishsp/mlb-data-scripts. Hope it helps.