Closed sallyqus closed 4 years ago
Hmm, that's unexpected. Can you send me a sample for your dataset (just a couple of examples) so that I can test it on my end?
Hi thanks for quick response. Yeah it is a bit weird.
For sure, here are some example sentences without parsing and tokenization:
a person is in a kitchen holding some dishes.
a person walks to a doorway.
a person is fixing a light.
person they open another door.
person turns on the light.
person turns a light on above the door.
a person replaces a lightbulb above a doorway.
a person is throwing clothes into a laundry basket.
person throws clothes in bucket.
person walks through the doorway.
FYI, I used exactly the same command to run and same version of tokenizer and parser.
Can this model be used without groundtruth? Can I set every second sentence (groundtruth) in the pair to be empty? Or it can work if I set the groundtruth as the same input sentence? Any suggestions?
The generate_paraphrases_sow_reap.py expects an input file with an even number of sentences (input sent and ground truth). My guess is that your input file has an odd number of sentences, and the error occurs as it tries to parse the last empty line.
The model can be used without the ground truth. But instead of setting every second sentence in the pair to be empty, set the ground truth to be the same as the input sentence.
Hi Tanya. Thank you for getting back here.
What I was doing, before I raised this issue and talked with you two days ago, was the use of same input as groundtruth. But the quality is not very good.
Will this operation (the use of same input sentence as groundtruth) induce any impact on the output?
The use of the same input as groundtruth does not impact the output, the ground truth isn't used during the generation process at all.
In the training data used to train the SOW-REAP model, we filtered the data to only retain sentences above 8 words. Most of the examples I see here are shorter, so there is a natural mismatch between the training data and your test data.
For the length of sentences we were considering, we decided to incorporate upto 3 syntax reorderings of the subtree within each sentence level reordering. For shorter sentences, I would suggest that you reduce this number to 1 or 2, as those work better for shorter sentences.
Thank you for your advice. Will take it. 👍
Hi thanks for great work. I am trying to use it for my custom data, for a video grounding project.
I am sure that I have done the tokenization and parsing. Any hint about this bug?
Could you help with these two problems? Any help is greatly appreciated.
Best, Sally