This is the official code for the Microsoft's paper of HMNet model at EMNLP 2020. It is implemented under PyTorch framework. The related paper to cite is:
@Article{zhu2020a,
author = {Zhu, Chenguang and Xu, Ruochen and Zeng, Michael and Huang, Xuedong},
title = {A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining},
year = {2020},
month = {November},
url = {https://www.microsoft.com/en-us/research/publication/end-to-end-abstractive-summarization-for-meetings/},
journal = {Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing},
}
It is recommended to run our model inside a docker:
Build docker image
cd Docker
sudo docker build . -t hmnet
Run container from image
sudo nvidia-docker run -it hmnet /bin/bash
Get the pretrained HMNet ready at ExampleInitModel/HMNet-pretrained
. Please see document.
Finetune on AMI dataset
CUDA_VISIBLE_DEVICES="0,1,2,3" mpirun -np 4 --allow-run-as-root python PyLearn.py train ExampleConf/conf_hmnet_AMI
The training log/model/settings could be found at ExampleConf/conf_hmnet_AMI_conf~/run_1
ExampleRawData/meeting_summarization/AMI_proprec
: The preprocessed AMI dataset. The *.json
files point to the path to each split. Each folder (train
, dev
or test
) contains the compressed chunks of data in the format for infinibatch.
ExampleRawData/meeting_summarization/ICSI_proprec
: Same as above for ICSI dataset.
ExampleInitModel/transfo-xl-wt103
: Here we only used the vocabulary from Transformer-XL, provided by Huggingface.
In ExampleConf/conf_eval_hmnet_AMI
, for the line
PYLEARN_MODEL ###
Replace ###
to the real checkpoint path. Use the relative path w.r.t the location of this configuration file.
CUDA_VISIBLE_DEVICES="0,1,2,3" mpirun -np 4 --allow-run-as-root python PyLearn.py evaluate ExampleConf/conf_eval_hmnet_AMI
The decoding results could be found at ExampleConf/conf_eval_hmnet_AMI_conf~/run_1
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include Microsoft, Azure, DotNet, AspNet, Xamarin, and our GitHub organizations.
If you believe you have found a security vulnerability in any Microsoft-owned repository that meets Microsoft's Microsoft's definition of a security vulnerability, please report it to us as described below.
Please do not report security vulnerabilities through public GitHub issues.
Instead, please report them to the Microsoft Security Response Center (MSRC) at https://msrc.microsoft.com/create-report.
If you prefer to submit without logging in, send email to secure@microsoft.com. If possible, encrypt your message with our PGP key; please download it from the the Microsoft Security Response Center PGP Key page.
You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at microsoft.com/msrc.
Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue:
This information will help us triage your report more quickly.
If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our Microsoft Bug Bounty Program page for more details about our active programs.
We prefer all communications to be in English.
Microsoft follows the principle of Coordinated Vulnerability Disclosure.