jefferyYu / UMT

Preprocessed Datasets for our Multimodal NER paper
111 stars 13 forks source link

Issues related to data processing #7

Open JinFish opened 4 years ago

JinFish commented 4 years ago

Hey, I downloaded the data set through the link provided by you, but the form is very different from the form under the data directory in your Github. I looked at the code in the run_mtmner_crf.py file, but I could not find the method of data processing. I hope you can tell me how to convert the downloaded data into the data form under your data directory, thank you very much.

jefferyYu commented 4 years ago

Hi there,

Yes, the dataset set provided through the link is constructed for another sentiment analysis task (https://github.com/jefferyYu/TomBERT), which is quite different from the MNER task here. Note that the provided link is only used for downloading the images of the multimodal tweets in our two MNER datasets.

The data processing part for these two MNER datasets is provided in the function "mmreadfile(filename):" (line 145-193) of the run_mtmner_crf.py file.

Hope it clarifies your concern. Please let me know if you have any other questions.

Best, Jianfei

JinFish commented 4 years ago

Therefore, the data set under the data directory in your GitHub is the real text data set and is complete, right?

jefferyYu commented 4 years ago

Yep.

JinFish commented 4 years ago

Thank you for your patience. If I have any other questions, I will contact you again.