Closed luyahan closed 7 years ago
@luyahan In the future please look through open and closed tickets or perform a search first before posting a question as it helps keep the same questions from being opened multiple times. That said, refer to this ticket: https://github.com/tensorflow/models/issues/464
You will find a vocab file there that you can point to that will provide you some results, other than
Something that is really important to note here though is that due to the fact that this is an abstractive model (not extractive), you will really need a lot of data to train against to get good results that are useable because the model is truly trying to generate a headline based on the input rather than just providing a reduced result by deleting words. This means that the model requires a lot of "clean" data which means that you will have to either procure your own dataset via scraping or pay for something like the Gigaword dataset. I reached out to LDC and they advised that they do provide some cheaper datasets form like NY Times or something, so should my attempts at scraping my own data not work I will be looking to that solution.
One final note, should you still be wondering why you are getting
Hope that helps some. Please close this when you have the chance.
@xtr33me I am also doing the research around automatic text summarization.
I am wondering if your scraped data works well?
Is it possible for you to open source your scraped data and code?
This code provided with paper Teaching Machines to Read and Comprehend opensourced scrap code which can get around 200,000 news articles from CNN and DailyMail.
i trained the model using the toy data ,then i decode it the result of decoding is very strange exapmple :output= for
the lastest runing_arg_loss is about 0.002
why cause it? IS the data set(toy data) is too smal ?
i'm particularly grateful to your help