This repository contains code, datasets, and links related to the Knowledge Computing (KC) group at Microsoft Research Asia (MSRA).

Our group is hiring both research interns and full-time employees! If you are interest, please take a look at:

Internship opportunities in KC (PDF);
Researcher or RSDE positions and select "China" on the left-side "Country/Region" menu.

News:

2023-Sep: The Recognizers-Text project reached over 9 million package downloads (across NuGet/npm/PyPI)!
2023-May: Three papers accepted by ACL'23, including MLKD OOD, CoLaDa, and TACR.
2022-Aug: The Recognizers-Text project reached over 5 million package downloads (across NuGet/npm/PyPI)!
2022-May: Tiara (ReTraCk v2), KC's new knowledge base question answering (KBQA) system, has reached #1 in all Generalizable Question Answering (GrailQA) evaluation categories including Overall, Compositional Generalization, and Zero-Shot.
2022-Apr: We have now open-sourced the latest version of the LinkingPark system for automatic semantic table interpretation. This new version includes improved performance, stability, flexibility, and overall results. Contributions and collaboration are very welcome!
2022-Mar: The Recognizers-Text project reached over 4 million package downloads (across NuGet/npm/PyPI)!
2021-Jul: The Recognizers-Text project reached over 3 million package downloads (across NuGet/npm/PyPI)!
2021-May: ReTraCk has reached #1 in the Generalizable Question Answering (GrailQA) leaderboard for knowledge base QA (KBQA).
2020-Dec: The Recognizers-Text project reached over 2 million package downloads (across NuGet/npm/PyPI)!
2020-Nov: The LinkingPark system, developed in partnership between the Knowledge Computing group at MSRA and our collaborators in MSR Cambridge, has gotten 2nd place in the SemTab 2020 challenge (Semantic Web Challenge on Tabular Data to Knowledge Graph Matching)!

Recent Papers:

Multi-Level Knowledge Distillation for Out-of-Distribution Detection in Text, Qianhui Wu, Huiqiang Jiang, Haonan Yin, Börje F. Karlsson, Chin-Yew Lin, ACL 2023.
Repository: https://github.com/microsoft/KC/tree/main/papers/MLKD_OOD
ColaDa: A Collaborative Label Denoising Framework for Cross-lingual Named Entity Recognition, Tingting Ma, Qianhui Wu, Huiqiang Jiang, Börje F. Karlsson, Tiejun Zhao, Chin-Yew Lin, ACL 2023.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/CoLaDa
TACR: A Table-alignment-based Cell-selection and Reasoning Model for Hybrid Question-Answering, Jian Wu, Yicheng Xu, Yan Gao, Jian-Guang Lou, Börje F. Karlsson, Manabu Okumura, Findings of the Association for Computational Linguistics: ACL 2023.
TIARA: Multi-grained Retrieval for Robust Question Answering over Large Knowledge Bases, Yiheng Shu, Zhiwei Yu, Yuhan Li, Börje F. Karlsson, Tingting Ma, Yuzhong Qu, Chin-Yew Lin, EMNLP 2022, 2022.
Repository: https://github.com/microsoft/KC/tree/master/papers/TIARA
LinkingPark: An Automatic Semantic Table Interpretation System, Shuang Chen, Alperen Karaoglu, Carina Negreanu, Tingting Ma, Jin-Ge Yao, Jack Williams, Feng Jiang, Andy Gordon, Chin-Yew Lin, Journal of Web Semantics, 2022.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/LinkingPark
Rows from Many Sources: Enriching row completions from Wikidata with a pre-trained Language Model, Carina Negreanu, Alperen Karaoglu, Jack Williams, Shuang Chen, Daniel Fabian, Andrew Gordon, Chin-Yew Lin, Wiki Workshop 2022.
On the Effectiveness of Sentence Encoding for Intent Detection Meta-Learning, Tingting Ma, Qianhui Wu, Zhiwei Yu, Tiejun Zhao, Chin-Yew Lin, NAACL 2022.
Repository: https://github.com/microsoft/KC/tree/master/papers/IDML
Decomposed Meta-Learning for Few-Shot Named Entity Recognition, Tingting Ma, Huiqiang Jiang, Qianhui Wu, Tiejun Zhao, Chin-Yew Lin, Findings of the ACL 2022.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/DecomposedMetaNER
AdvPicker: Effectively Leveraging Unlabeled Data via Adversarial Discriminator for Cross-Lingual NER, Weile Chen, Huiqiang Jiang, Qianhui Wu, Börje F. Karlsson, Yi Guan, ACL 2021.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/AdvPicker
ReTraCk: A Flexible and Efficient Framework for Knowledge Base Question Answering, Shuang Chen, Qian Liu, Zhiwei Yu, Chin-Yew Lin, Jian-Guang Lou, Feng Jiang, ACL 2021. (demo paper)
Repository: https://github.com/microsoft/KC/tree/master/papers/ReTraCk
Issues with Entailment-based Zero-shot Text Classification, Tingting Ma, Jin-Ge Yao, Chin-Yew Lin, Tiejun Zhao, ACL 2021. (short paper)
Repository: https://github.com/microsoft/KC/tree/master/papers/Entailment-Issues
BoningKnife: Joint Entity Mention Detection and Typing for Nested NER via prior Boundary Knowledge, Huiqiang Jiang, Guoxin Wang, Weile Chen, Chengxi Zhang, Börje F. Karlsson, arXiv:2107.09429 - 2020/2021.
LinkingPark: An integrated approach for Semantic Table Interpretation, Shuang Chen, Alperen Karaoglu, Carina Negreanu, Tingting Ma, Jin-Ge Yao, Jack Williams, Andy Gordon, Chin-Yew Lin, Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2020) at ISWC 2020.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/LinkingPark
UniTrans: Unifying Model Transfer and Data Transfer for Cross-Lingual Named Entity Recognition with Unlabeled Data, Qianhui Wu, Zijia Lin, Börje F. Karlsson, Biqing Huang, Jian-Guang Lou, IJCAI 2020.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/UniTrans
Single-/Multi-Source Cross-Lingual NER via Teacher-Student Learning on Unlabeled Data in Target Language, Qianhui Wu, Zijia Lin, Börje F. Karlsson, Jian-Guang Lou, Biqing Huang, ACL 2020.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/SingleMulti-TS
Enhanced Meta-Learning for Cross-lingual Named Entity Recognition with Minimal Resources, Qianhui Wu, Zijia Lin, Guoxin Wang, Hui Chen, Börje F. Karlsson, Biqing Huang, Chin-Yew Lin, AAAI 2020.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/Meta-Cross
Improving Entity Linking by Modeling Latent Entity Type Information, Shuang Chen, Jinpeng Wang, Feng Jiang, Chin-Yew Lin, AAAI 2020.
Exploring Word Representations on Time Expression Recognition, Sanxing Chen, Guoxin Wang, Börje Karlsson, Technical Report - Microsoft Research Asia, 2019.
Towards Improving Neural Named Entity Recognition with Gazetteers, Tianyu Liu, Jin-Ge Yao, Chin-Yew Lin, ACL 2019.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/SubTagger
CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition, Yuying Zhu, Guoxin Wang, Börje F. Karlsson, NAACL-HLT 2019.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/CAN-NER
GRN: Gated Relation Network to Enhance Convolutional Neural Network for Named Entity Recognition, Hui Chen, Zijia Lin, Guiguang Ding, Jian-Guang Lou, Yusen Zhang, Börje F. Karlsson, AAAI 2019.
Repository: https://github.com/microsoft/vert-papers/tree/master/papers/GRN-NER

Related Projects:

VERT (Versatile Entity Recognition & Disambiguation Toolkit) - Open-source repository including code and datasets for the KC papers related to entity extraction/disambiguation/understanding;
microsoft/Recognizers-Text - Open-source library that provides recognition and normalization/resolution of numbers, units, date/time, and sequences (e.g., phone numbers, URLs) expressed in multiple languages.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

microsoft / KC

readme

News:

Recent Papers:

Related Projects:

Contributing

Trademarks