Open raspberryice opened 6 years ago
@raspberryice There are some problems in running the original models (using run.sh), in the salientFast part, it reads an input file that does not existed (top-token-phrase.txt), so that it leads to empty result in the end. So what should this file suppose to be? Or we can use some existed file to replace this one? I am not sure what this file should contains.
top_token_phrase.txt
should be the output of running the segmentation algorithm written in C.
java -jar Tokenizer.jar -m test -i ../data/stopwords.txt -o tmp/tokenized_stopwords.txt -t tmp/token_mapping.txt -c N -thread 15 ./bin/segphrase_train --verbose --thread 15 --max_len 20
Can you try running this line only?
It can run, but './bin/segphrase_train' does not exist and 'quality_phrases.txt' doesn't seem to be generated. so top_token_phrase.txt can not be 'cp' from the 'quality_phrases.txt'
Have you tried running make all
under the cseg
folder?
Turns out that it is the g++ version problem, it can run successfully now
We have try some wildcard replacement method: 1) remove the adj before tagged noun; 2) replace adj as 'WILDCARD' before tagged noun, so as goodpattern. Then we run the model, and can see some more events are being extracted for certain entities, but some other are less due to removal of adj. We may try more tests on the current strategy or other strategy and write our attempts in our final report. As for the conclusion right now, we can see some more events extracted which meets the project expectation in some degrees.
If you added the wildcard replacement, the patterns with adj and without adj should be merged so there should only be more extractions. Can you give me an example of when less patterns will be extracted? P.S. Have you started working on the final report? 7 pages is a lot to write using the ACM template.
I revised the goodpatterns.txt
file. Perhaps you will need to update new_goodpatterns.txt
and new_goodpatterns_wildcard.txt
accordingly.
That’s probably due to the way we added the “WILDCARD”. We thought about the results maybe merged, so when we checked the desired tagged noun., we checked its immediate preceding word. If it was an adj, we replaced it; if it was not an adj, however, we manually inserted one so that it could be matched and merged with other cases.
The MPs under the same entities for sure being reduced considering this: two meta pattern $Date local $Case and $Case $Date. They will all become WILDCARD $Case WILDCARD $Date in the end due to our implementation. As for the number of events, now I do think there is no way the events are extracted less under each entity. I presume there are bugs that we missed some adj so that they were not tagged.
The total number of MPs for different entities are significantly increased since lots of MPs now appeared due to WILDCARD processing. We will first compare the positive result and then fix the bug.
Yes, we are writing the report, and thank you so much for your concern!
On Nov 21, 2018, at 15:49, Zoey Li notifications@github.com wrote:
If you added the wildcard replacement, the patterns with adj and without adj should be merged so there should only be more extractions. Can you give me an example of when less patterns will be extracted? P.S. Have you started working on the final report? 7 pages is a lot to write using the ACM template.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/raspberryice/MetaPAD/issues/2#issuecomment-440821415, or mute the thread https://github.com/notifications/unsubscribe-auth/AdIT_JgC_r-I0eAV4A8iOVKRCNDQI2OUks5uxcpTgaJpZM4X7P6N.
For the report, you should work towards: 1 page intro, 0.5 page related work, 0.5 page problem formulation, 2 pages methodolgy, 3 pages experiments (including figures, case studies, analysis).
Will you be part of TA’s evaluation team?
On Nov 21, 2018, at 23:17, Zoey Li notifications@github.com wrote:
For the report, you should work towards: 1 page intro, 0.5 page related work, 0.5 page problem formulation, 2 pages methodolgy, 3 pages experiments (including figures, case studies, analysis).
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/raspberryice/MetaPAD/issues/2#issuecomment-440915506, or mute the thread https://github.com/notifications/unsubscribe-auth/AdIT_Nuamvt7q2zV0GVf0k_c5WGybMzXks5uxjN1gaJpZM4X7P6N.
No, I'm not part of the evaluation team but I've talked to the instructor about the project.
Before the segmentation step, allow some words in the metapattern to be replaced by the wildcard symbol.