yusanshi / news-recommendation

Implementations of some methods in news recommendation.
MIT License
241 stars 50 forks source link

news abstract #4

Closed TJ2364 closed 3 years ago

TJ2364 commented 3 years ago

In news.tsv, I noticed that the abstract is not a paragraph of text, but a URL, therefore,is it wrong to directly use the tokenizer method in the code to process the abstract? What should I do to get a complete abstract?What's more, if you have a complete news.tsv, is it convenient to share it?

yusanshi commented 3 years ago

Sorry but I checked the code and I don't think it is wrong.

https://github.com/yusanshi/NewsRecommendation/blob/master/src/data_preprocess.py#L99-L106

From https://github.com/msnews/msnews.github.io/blob/master/assets/doc/introduction.md:

news.tsv

The docs.tsv file contains the detailed information of news articles involved in the behaviors.tsv file. It has 7 columns, which are divided by the tab symbol:

What's more, if you have a complete news.tsv, is it convenient to share it?

I use news.tsv downloaded from https://msnews.github.io/#getting-start and it seems enough.

TJ2364 commented 3 years ago

@yusanshi

Thank you very much for your reply. I think I mistook the abstract for the title, and please excuse me for causing you trouble.