Data Directory used when running test_phrase_grammar.py

yikangshen / Ordered-Neurons

Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"

https://arxiv.org/pdf/1810.09536.pdf

BSD 3-Clause "New" or "Revised" License

577 stars 101 forks source link

Data Directory used when running test_phrase_grammar.py #24

Closed YianZhang closed 4 years ago

YianZhang commented 4 years ago

Hi Yikang and other Contributors,

Thank you for making public the source code! I am trying to reproduce your results, but I am not sure what path to use as the command line argument of test_phrase_grammar --data. I downloaded PTB data and I am currently using treebank_3/parsed/mrg as the data argument. It does not work.

The listings under treebank_3/parsed/mrg: atis brown readme.mrg swbd wsj

The listings under treebank_3/parsed/mrg/wsj:

00 06 12 18 24 01 07 13 19 MERGE.LOG 02 08 14 20 03 09 15 21 04 10 16 22 05 11 17 23

Thank you for your time! Ian

yikangshen commented 4 years ago

Hi Ian, You need to copy the wsj folder to ~/nltk_data/corpora/ptb/WSJ.

YianZhang commented 4 years ago

Hi Yikang,

Thanks for the response! I figured that out. However, what is args.data in test_phrase_grammar used for?

Thanks, Ian

yikangshen commented 4 years ago

It points to the dictionary that the model actually uses.

YianZhang commented 4 years ago

It points to the dictionary that the model actually uses.

Thanks for the response! Do you mean "directory" or "dictionary"?

Best, Ian

yikangshen commented 4 years ago

Dictionary

yikangshen commented 4 years ago

While testing parsing F1, the model still needs to load dictionary from training corpus

YianZhang commented 4 years ago

Thanks for your prompt response!

After carefully checking your code, I believe the dictionary is loaded from a fixed path: https://github.com/yikangshen/Ordered-Neurons/blob/46d63cde024802eaf1eb7cc896431329014dd869/test_phrase_grammar.py#L279-L282

And args.data is used as the directory of the test data: https://github.com/yikangshen/Ordered-Neurons/blob/46d63cde024802eaf1eb7cc896431329014dd869/test_phrase_grammar.py#L293

Am I correct?

Thanks for your help again! It would be appreciated if you can also check the other issue of mine: #25. As far as I know, this problem also confuses other researchers.

Best, Ian

shawntan commented 4 years ago

The code assumes you have the cached dataset in the directory, and it would be cached if the training script was run prior to test_phrase_grammar.py.

But yes, you are correct.

YianZhang commented 4 years ago

Thanks a lot!