Closed ramya-raghu25 closed 3 years ago
The code resides in the processing folder. The readme there explains it.
You can run create_reap_data.sh to generate the ground truth reap data. More details are in the readme.
I wanted to know how you generate sample_test_sow_reap.txt and sample_test_gt_reap.txt which you give as input in create_sow_data.sh and create_reap_data.sh. You have mentioned you use stanford nlp parser to generate this data. But its unclear: java -mx4g -cp "*" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,parse -preserveLines -ssplit.eolonly true -outputFormat text -file sample_test_baseline.txt
I would also like to know where is the stanford corenlp parser folder and sample_test_baseline.tok
sample_test_baseline.txt is the custom dataset that is used. It contains paraphrase pairs in the following format: sentence1 paraphrase2 [blank line] sentence2 paraphrase2 [blank line] ....
Please download the stanford core nlp module from https://nlp.stanford.edu/software/ the above java command is run with this sample_test_baseline. Please follow their documentation to set up the parser. The command needs to be run from the parser root directory.
This will generate the sample_test_sow_reap.txt file that is required as input for the create_sow_data.sh and create_reap_data.sh.
The sample_test_gt_reap.txt file is one of the intermediate outputs of the create_reap_data.sh. it will be stored in the intermediate folder that you specify.
Hope this helps!
hi @tagoyal,
Really nice work! Trying to use your algorithm for a custom dataset. Would it be possible for you to release get_ground_truth_alignments.py for REAP?
-Ramya