yumoxu / stocknet-dataset

A comprehensive dataset for stock movement prediction from tweets and historical stock prices.
MIT License
563 stars 169 forks source link

Dataset dosen't match the description in paper #7

Closed WJMacro closed 3 years ago

WJMacro commented 3 years ago

Hi, I cloned the stock-net repo and try to reproduce the results you mentioned in your paper. I found there is only 656 examples in Devset and 1008 examples in Testset. But according to your paper, the numbers are 2555 and 3720. I wonder if there's something missing in the dataset you've uploaded. Or the DataPipe code was wrong.

yumoxu commented 3 years ago

Thanks for your interest in our work!

Our dataset contains 2,555 and 3,720 movements in the dev and test set, respectively. In our paper’s experiment setting (see the last paragraph of Section 3), we further filter samples by ensuring there is at least one tweet for each corpus in the lag to alleviate sparsity, which results in less eligible training/dev/test samples.

However, this experiment setting can be model-specific (when all the predictions in the lag need to be explicitly modeled), and you are free to use all the samples in the dataset if that is more appropriate in your own setting. To do so, you can remove this constraint by commenting out these two lines.

WJMacro commented 3 years ago

Thanks for your interest in our work!

Our dataset contains 2,555 and 3,720 movements in the dev and test set, respectively. In our paper’s experiment setting (see the last paragraph of Section 3), we further filter samples by ensuring there is at least one tweet for each corpus in the lag to alleviate sparsity, which results in less eligible training/dev/test samples.

However, this experiment setting can be model-specific (when all the predictions in the lag need to be explicitly modeled), and you are free to use all the samples in the dataset if that is more appropriate in your own setting. To do so, you can remove this constraint by commenting out these two lines.

Thank you for your kind explanation~