Open wise-east opened 5 years ago
Because lots of the procedures are more like matching image and corresponding post. For common preprocessing, for the text processing, I remove the stopwords and use the jieba as word splitter. Th word embeddings are pretrained on the given dataset. Hope this helps.
Can you expose a dataset or expose several dataset instances, dataset formats, pre-trained models, or checkpoints?
I have uploaded a small dataset to show the dataset formats.
To be more convenient, the prcoess_data_weibo.py is also added. The model can be trained and tested on uploaded example dataset with 2 options: EANN ---multimodal(text and image) and EANN_text ----single textual features.
I was trying to replicate some of the results in the paper, but I realised that the metrics such as accuracy and precision were fluctuating widely due to the small dataset. After removing the posts without images, the size of the training set is 43, the validation set is 11, and the test set is 20. Could you kindly upload a larger dataset so that we can get a better sense of the metrics? Thank you
Thanks for your interest in our work. I will upload a larger data. Hope this will help.
Thank you very much
Thanks for your interest in our work. I will upload a larger data. Hope this will help.
I have read your paper in KDD 2018, it's quite great. And now i'm doing your experiment, would mind also share a larger dataset for me?
Can you give a link for Twitter Dataset?
I'd like to train this model and reproduce some of its results. I realized there is a local file 'process_data_weibo' that has been imported in the EANN_model.py which is not included in this repository. Would it be possible to have process_data_weibo.py file to be uploaded as well to get a better understanding of the processing that took place so I can quickly format other data to fit this model?
import process_data_weibo as process_data