yflyl613 / Tiny-NewsRec

[EMNLP 2022] Official Pytorch implementation for "Tiny-NewsRec: Efficient and Effective PLM-based News Recommendation"
16 stars 2 forks source link

No file named `docs_filter.tsv` #2

Closed jordane95 closed 1 year ago

jordane95 commented 1 year ago

Hi,

I want to run the pretraining part in your jupyter notebook, but find that the pretraining data is not prepared...

May I know how you construct the docs_filter.tsv file? Thanks.

yflyl613 commented 1 year ago

Hi,

I want to run the pretraining part in your jupyter notebook, but find that the pretraining data is not prepared...

May I know how you construct the docs_filter.tsv file? Thanks.

I'm sorry I could not provide the docs_filter.tsv used in our experiments. It is collected from the Microsoft News corpus during my internship at Microsoft. Actually, I think you can use other public news corpus instead. For example, you may try to use the news articles provided with the MIND dataset for pre-training.