Did you use the test data during training in the Unsupervised Parsing experiment ?

yikangshen / Ordered-Neurons

Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"

BSD 3-Clause "New" or "Revised" License

577 stars 101 forks source link

On reviewing the fellowing code, I find that the train data contain the test data. Is this coirrect?

https://github.com/yikangshen/Ordered-Neurons/blob/46d63cde024802eaf1eb7cc896431329014dd869/data_ptb.py#L25

for id in file_ids:
    if 'WSJ/00/WSJ_0000.MRG' <= id <= 'WSJ/24/WSJ_2499.MRG':
        train_file_ids.append(id)
    if 'WSJ/22/WSJ_2200.MRG' <= id <= 'WSJ/22/WSJ_2299.MRG':
        valid_file_ids.append(id)
    if 'WSJ/23/WSJ_2300.MRG' <= id <= 'WSJ/23/WSJ_2399.MRG':
        test_file_ids.append(id)
    # elif 'WSJ/00/WSJ_0000.MRG' <= id <= 'WSJ/01/WSJ_0199.MRG' or 'WSJ/24/WSJ_2400.MRG' <= id <= 'WSJ/24/WSJ_2499.MRG':
    #     rest_file_ids.append(id)

yikangshen / Ordered-Neurons

Did you use the test data during training in the Unsupervised Parsing experiment ? #18