yikangshen / Ordered-Neurons

Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"
https://arxiv.org/pdf/1810.09536.pdf
BSD 3-Clause "New" or "Revised" License
577 stars 101 forks source link

Did you use the test data during training in the Unsupervised Parsing experiment ? #18

Closed jiaxin96 closed 5 years ago

jiaxin96 commented 5 years ago

On reviewing the fellowing code, I find that the train data contain the test data. Is this coirrect?

https://github.com/yikangshen/Ordered-Neurons/blob/46d63cde024802eaf1eb7cc896431329014dd869/data_ptb.py#L25

for id in file_ids:
    if 'WSJ/00/WSJ_0000.MRG' <= id <= 'WSJ/24/WSJ_2499.MRG':
        train_file_ids.append(id)
    if 'WSJ/22/WSJ_2200.MRG' <= id <= 'WSJ/22/WSJ_2299.MRG':
        valid_file_ids.append(id)
    if 'WSJ/23/WSJ_2300.MRG' <= id <= 'WSJ/23/WSJ_2399.MRG':
        test_file_ids.append(id)
    # elif 'WSJ/00/WSJ_0000.MRG' <= id <= 'WSJ/01/WSJ_0199.MRG' or 'WSJ/24/WSJ_2400.MRG' <= id <= 'WSJ/24/WSJ_2499.MRG':
    #     rest_file_ids.append(id)
shawntan commented 5 years ago

The distance values are extracted from the already trained models on the ptb language modeling training set. There's no additional training involved when performing unsupervised parsing in our set up.