wjko2 / Domain-Agnostic-Sentence-Specificity-Prediction

13 stars 10 forks source link

How to call training and testing on new data? #4

Closed lucas0 closed 5 years ago

lucas0 commented 5 years ago

In your instructions you say:

to change s1['test']['path'] and s1['unlab']['path'] but there are two other variables (target['test']['path'] and target['test']['path']) that hold the target values depending on the data you're using to train/test your model.

What should we assign to those? From your instructions seems that we should use the target values for twitter/yelp/movies for our dataset.

How have you arrived on these values?

My data2.py and train.py/test.py modifications are like this:

if dat=='my_data':
        s1['test']['path'] = os.path.join(data_path, 'my_data_test.txt')
        s1['unlab']['path'] ='dataset/pdtb/my_data_unlabeled.txt'
        target['test']['path'] = os.path.join(data_path,'twitterl.txt')
        targetv['test']['path'] = 'dataset/pdtb/tlv.txt'
if params.test_data=="my_data":
    _,xst= getFeatures(os.path.join(params.nlipath,'my_data_test.txt'))
    _,xsu = getFeatures('dataset/pdtb/my_data_unlabeled.txt')

And then the calls are:

python train.py --gpu_id 0 --test_data my_data

python test.py --gpu_id 0 --test_data my_data

This is working, but I'm not comfortable using the twitter target['test']['path'] and target['test']['path']. Is there any way I could generate those values for my data?

Thanks a lot for the work and making the code available. This will help me a lot in my work.

Best, Lucas.

wjko2 commented 5 years ago

This system is designed to adapt to new domains using only the sentences from the new domain, and does not require specificity labels from the new domain. The target['test']['path'] is used only for evaluation purposes, and does not effect the result, so you don't need to generate them. Those labels were annotated on MTurk.

lucas0 commented 5 years ago

Thank you very much. Are those calls the right way to do it for a new domain?

python train.py --gpu_id 0 --test_data my_data

python test.py --gpu_id 0 --test_data my_data

Also, how would the model work for longer sentences? Is it ok to put a paragraph with multiple sentences as a single input? Would the varying length of the paragraphs be an issue?

wjko2 commented 5 years ago

Yes, It's the right way. It is ok to use multiple sentences as a single input. I have only run this on the released datasets and some dialog datasets, so I am not sure how it will work if the inputs are very long.