Closed twoflypig closed 5 years ago
Thanks for mentioning the missing import. Yes, you need the newspaper library for that class. Also, you can download the original html news articles from the original CNN/DM dataset and this code will take care of those files, too. However, if you only have .story files, simply set the mode option to anything but "article" and it will process everything from .story files.
Hi, thanks for your awesome work! I met two problems in data pre-processing stage.
It seems that the code missing a import statement in src/helper/cnn_dm_downloader.py. In line https://github.com/yaserkl/RLSeq2Seq/blob/0095a768b4c2ab65babf87806e7c372d22cde3f0/src/helper/cnn_dm_downloader.py#L42, you use Article class. However, it should be imported from newspaper module.
Error in input data? https://github.com/yaserkl/RLSeq2Seq/blob/0095a768b4c2ab65babf87806e7c372d22cde3f0/src/helper/cnn_dm_downloader.py#L83 , in this line you want the input ended by htmls. However, in your src/helper/README.rst, section Download Raw Data , you said
After the data download from link, I find it ended with .story, which will not be processed by the code.
Looking forward to your replay. Thank you !