.pt files generation of own dataset

nlpyang / hiersumm

Code for paper Hierarchical Transformers for Multi-Document Summarization in ACL2019

Apache License 2.0

231 stars 42 forks source link

.pt files generation of own dataset #13

Open Zirsha opened 4 years ago

Zirsha commented 4 years ago

Hi! Thank you for the code. I successfully evaluated your model. Now i am interested in extracting summaries for my own dataset. Can you please guide me how to extract .pt files. I have aso tried onmt pre-processing and extracted 3 .pt files demo.train.0.pt, demo.valid.0.pt, demo.vocab.pt, but while testing your model with demo.valid.0.pt, it generates error. Kindly guide me how to use these files or how you have generated your .pt files(ranked version data) Screenshot from 2019-10-30 09-32-27

vishaljoshi-066 commented 4 years ago

I am new to this, can you please guide me how to generate summary after successfully loading the pretrained model.

apekshapriya commented 4 years ago

@Zirsha Hey, did you solve the problem? I had also tried onmt pre-processing and extracted 3 .pt files demo.train.0.pt, demo.valid.0.pt, demo.vocab.pt but then I am getting the same error as yours. Can you help me if you got the solution?

yuezhao-zy commented 4 years ago

could you explain the data form of the .pt file?

IreneZihuiLi commented 4 years ago

I think the .pt file was not the same by using opennmt pre-processing code. Could you please release the pre-processing code?

melvintzw commented 4 years ago

Hi, I would like to bump up this issue. Could the authors kindly release the pre-processing model/code or ranking model/code that transforms a set of input text paragraphs into the input vectors required by the wikisum_model_step_500000.pt model? For my research, I would love to apply your system on a custom dataset. I would also love to observe the text summaries output by your system.

I checked the datapoints being fed into the pytorch model you provided. I assumed that the 'src' attribute refers to the 40 vectors that correspond to the top 40 paragraphs found by your ranker, as described in your paper. The datapoint is illustrated below.

Tinarights commented 3 years ago

Hi,

Have any of you solve this issue?

I could not format my own data

Tinarights commented 3 years ago

@IreneZihuiLi @apekshapriya @melvintzw Excuse me, have any of you solved this issue? I could not format my own data. I really need it.

Best

melvintzw commented 3 years ago

@Tinarights sorry, I wasn't able to solve this issue on my end. You will need the code authors to provide the preprocessing model I think.