Closed Chacha-Chen closed 4 years ago
Hello,
The code you run is using our own data files which we couldn't share directly due to Twitter policies.
Could you use the version under the shared_task
folder? I provide a cleaned version for LR baseline, but you have to prepare the data files though.
Thanks, Shi
Hi Shi,
Thanks for the quick reply.
Could you provide more instructions on how to prepare my own data files.
It seems that the xxx.pkl files are generated using data_preprocessing.py from xxx.jsonl filles.
I am confused that where do I need to download my own files?
Thanks.
Hello,
I have a README.md file here: https://github.com/viczong/extract_COVID19_events_from_Twitter/tree/master/shared_task.
Thanks,
Hi Shi,
Is the data_processing.py provided as a script to generate data files for LR and BERT, converting from provided jsonl files?
I checked your README.md. I'm sorry but I am confused about the required input jsonl file for the data_processing.py, given your provided annotated data and my downloaded tweets.
For example, the provided xxx.jsonl fille do not have keys consensus_annotation
and candidate_chunks_with_id
.
Thanks.
Hello Chacha,
Currently to run the baseline model for the shared task, you will only need the files under shared_task
folder (i.e., you don't need to use data_processing.py from model
folder). If you just format your data as in https://github.com/viczong/extract_COVID19_events_from_Twitter/tree/master/shared_task, you will not need consensus_annotation
and candidate_chunks_with_id
fields.
[('Tom Hanks and his wife have both tested positive for the Coronavirus.',
'Tom Hanks',
'<Q_TARGET> and his wife have both tested positive for the Coronavirus .',
['Tom Hanks', 'his wife'],
1)]
We don't provide a script for data pre-processing for the shared task.
(Yes data_processing.py is what we use to generate data files for LR and BERT, but it is written to deal with our own data files. We don't get the time to totally clean the code to deal with different input format.)
I hope it helps.
Thanks, Shi
Got it. Thanks.
Hi Shi,
For the submission. Do we need to provide the part1.response as well? or only part2.xxx.response are needed?
Thanks.
Hi Chacha,
I think our current plan is to evaluate on slot filling questions (part2.xxx.response).
Thanks, Shi
Hi Shi, Thanks for the quick reply. One quick following question.
The final model takes input of full text, and output corresponding part2.responses. Am I understanding correctly? Will the part1.responses provided alongside the full text?
Thanks.
Hi Chacha,
Yes you are correct. I think we will provide the full text along with the candidate choices, and then the model makes predictions for those slot filling questions by selecting from the provided candidate choices. I don't think we will provide the part1.responses.
Thanks, Shi
Hi,
I was trying to reproduce the baseline results.
FileNotFoundError: [Errno 2] No such file or directory: '/data/zong/scraper_covid-MERGE/annotation/positive-FINAL.jsonl'
FileNotFoundError: [Errno 2] No such file or directory: 'data/test_positive.pkl'task_type_to_datapath_dict = { "tested_positive": ("/data/zong/scraper_covid-MERGE/annotation/positive-FINAL.jsonl", "data/test_positive.pkl"), "tested_negative": ("/data/zong/scraper_covid-MERGE/annotation/negative-FINAL.jsonl", "data/test_negative.pkl"), "can_not_test" : ("/data/zong/scraper_covid-MERGE/annotation/can_not_test-FINAL.jsonl", "data/can_not_test.pkl"), "death": ("/data/zong/scraper_covid-MERGE/annotation/death-FINAL.jsonl", "data/death.pkl"), "cure": ("/data/zong/scraper_covid-MERGE/annotation/cure-FINAL.jsonl", "data/cure.pkl"), }
Sorry that I might not understand what is going on and what are those files. Could you plz give more instructions on how to get those files? or descriptions.
Thanks.