Awesome-Papers
:question: Objective of jinmang2/Awesome-Papers
Repo.
:bulb: To be AI Researcher, Artist and Good Person...!!
2021 Papers to Read
- Learning to Learn without Gradient Descent by Gradient Descent
- Massively Multitask Networks for Drug Discovery
- One-Shot Imitation Learning
- Few-Shot Autoregressive Density Estimation: Towards Learning to Learn Distributions
- Meta-Learning for Low-Resource Neural Machine Translation
- Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
- SYNTHESIZER: Rethinking Self-Attention in Transformer Models
- Fine-tune BERT for Extractive Summarization
- ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations
- Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation
2020 Reading Papers
- 대충 쓱 본 논문은 기재하지 않음
- 전체 논문을 다 읽고 나 스스로 다른 정보까지 찾아본 논문들만 기재
- 예를 들어, word2vec같은 경우 개념은 알고 있지만 paper로 뜯어보진 않았기 때문에 기재하지 않음
Reinforcement Learning
- Asynchronous Methods for Deep Reinforcement Learning
- Continuous Control With Deep Reinforcement Learning
DDPG
, DQN+DPG
, Replay Buffer
, Soft-Update via Polyak Averaging
, Ornstein Uhlenbeck process
, White Gaussian Random process
, DeepMind
- Deterministic Policy Gradient Algorithms
DeepMind
, Policy Gradient
, Actor-Critic
, Deterministic Policy
- Policy Gradient Methods for Reinforcement Learning with Function Approximation
Compatible Function Approximation
, Policy Gradient
, Sutton
- Approximately Optimal Approximate Reinforcement Learning
Kakade & Langford
, Mixture Policy
, Policy Improvement
- True Region Policy Optimiation
Trust Region
, Natural Policy
, Kakade & Langford Thm
, Policy Improvement
, OpenAI
- Proximal Policy Optimization Algorithms
OpenAI
, Practical TRPO
, Clip Gradient
Meta-Learning
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
MAML
, Optimization-Based Meta-Learning
NLP
- Efficient Estimation of Word Representations in Vector Space
Word2Vec
, CBOW
, Skip-Gram
- Distributed Representations of Words and Phrases and their Compositionality
Enhanced vec repr quality
, SubSampling
, Negative Sampling
, Hierarchical Softmax
- Deep contextualized word representations
ELMo
, Feature-Based
, Pre-ELMo + Linear Combination
, SubWord Information by ConvNet
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Transformer's Encoder
, MLM
, NSP
- Neural Machine Translatoin By Jointly Learning to Align and Translate
GRU
, Seq2Seq with Attention
, Bahnau Attention
- Attention Is All You Need
Transformers
, Self-Dot Product Attention
, Seq2Seq
- Advances in Pre-Training Distributed Word Representations
- Enriching Word Vectors with Subword Information
- Minimum Risk Training for Neural Machine Translation
- Bag of Tricks for Efficient Text Classification
FastText for Text Classification
, Fast!
- A Fast and Accurate Dependency Parsing using Neural Networks
- MaltParser: A Data-Driven Parser-Generator for Dependency Parsing
- Incrementality in Deterministic Dependency Parsing
- A Neural Probabilistic Language Model
- Universal Language Model Fine-tuning for Text Classification
- The Natural Language Decathlon: Multitask Learning as Question Answering
MultiTask Learning
, anti-curriculum learning
- Phrase-Based & Neural Unsupervised Machine Translation
Initialization
, `,
Back-Translation`
- A Structured Self-Attentive Sentence Embedding
Graph
- Graph Attention Networks
- MAGNET: Multi-Label Text Classfication using Attention-based Graph Neural Network
Conversational AI
- Memory Networks
- End-To-End Memory Networks
- Learning Through Dialogue Interactions By Asking Questions
- Hierarchical Attention Networks for Document Classification
- Conversational Decision-Making Model for Predicting the King's Decision in the Annals of the Joseon Dynasty
Fundamental
- Decoupled Neural Interfaces using Synthetic Gradients
- Decoupled Weight Decay Regularization
- Neural Network Ensembles, Cross Validation, and Active Learning
- Sharp Minima Can Generalize For Deep Nets
- Long short-term memory
- Highway Networks
- Recurrent Highway Networks
ETC
- LSTM-SAE Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems
- C3D Learning Spatiotemporal Features with 3D Convolutional Networks
:office: NLP
Tokenization
- [x] BPE(Byte-Pair-Encoding); A New Algorithm for Data Compression (C-user journal 1994) paper
- [x] Adjust BPE on NMT; Neural Machine Translation of Rare Words with Subword Units (ACL 2016) paper
- Compare between
n-gram
and byte-pair-encoding
Wordpiece
SentencePiece
Morphological
Word Vector Representation
- [x] NPLM; A Neural Probabilistic Language Model (jmlr 2003) paper
- NPLM's Reference -> 문장에서 단어의 역할을 학습
- [ ] Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks (NIPS 2000) paper
- NN으로 고차원 이진 분산 표현을 실시하는 아이디어 제시
- [ ] Extracting distributed representations of concepts and relations from positive and negative propositions (IEEE 2000) link
- Hinton 교수의 연구가 성공적으로 적용된 사례
- [ ] Natural Language Processing With Modular Pdp Networks and Distributed Lexicon (Cognitive Science 1991 July) link
- Neural network를 LM에 적용시키려 한 사례
- NPLM's Reference -> word sequence distribution의 statistical model을 학습
- [ ] Sequential neural text compression (IEEE 1996) link
- I Love Schmidhuber a lot :)
- [x] Word2Vec 2013a; Efficient Estimation of Word Representations in Vector Space (ICLR 2013) paper
- Introduce
Skip-Gram
& CBOW
- Google Team
- [x] Word2Vec 2013b; Distributed Representations of Words and Phrases and their Compositionality (NIPS 2013) paper
- Propose train optimization method such as
negative sampling
- [ ] GloVe(Global Word Vectors); GloVe: Global Vectors for Word Representation (ACL 2014) paper
- Stanford Univ.
- Overcome
Word2Vec
and LSA
- [ ] Swivel(Submatrix-Wise Vector Embedding Learner); Swivel: Improving Embeddings by Noticing What’s Missing () paper
- [x] FastText; Enriching Word Vectors with Subword Information (17.06.16, arxiv) paper
NLP Tasks
A large annotated corpus for learning natural language inference, Bowman et al., 2015 (EMNLP)
A board-coverage challenge corpus for sentence understanding through inference, Williams et al., 2018
SQuad: 100,000+ questions for machine comprehension of text, Rajpurkar et al., 2016
introduction to th conll-2003 shared task: language-independent named entity recognition, Tjong Kim Sang and De Meulder, 2003
Dependency Parsing
- [ ] Incrementality in Deterministic Dependency Parsing (ACL, 2003) paper
- [ ] MaltParser: A Data-Driven Parser-Generator for Dependency Parsing (LREC, 2005) paper
- [ ] A Fast and Accurate Dependency Parser using Neural Network (EMNLP, 2014) paper
Neural Machine Translation
- [ ] MRT(Minimum Risk Training); Minimum Risk Training for Neural Machine Translation (ACL 2016) paper
Text Classification
- [x] FastText for classification; Bag of Tricks for Efficient Text Classification (ACL 2017) link
- [ ] UNMFit; Universal Language Model Fine-tuning for Text Classification (18.05.23, arxiv) paper
Question Answering
Stochastic Answer Networks for Machine Reading Comprehension https://arxiv.org/abs/1712.03556
Textual Entailment
Enhanced LSTM for Natural Language Inference https://arxiv.org/abs/1609.06038
Semantic Role Labeling
Deep Semantic Role Labeling: What Works and What’s Next https://www.aclweb.org/anthology/P17-1044/
Summarization
Extractive
- [ ] BertSum; Fine-tune BERT for Extractive Summarization (19.03.25, arxiv) paper
- [ ] BertSum-Full Paper; Text Summarization with Pretrained Encoders (19.08.22, arxiv) paper
Pre-trained NLP Architecture
- [ ] Semi-supervised sequence learning (NIPS 2015) paper
Word Representations: A Simple and General Method for Semi-Supervised Learning
institute |
subtitle |
title |
journal |
published |
etc |
AllenAI |
ELMo |
Deep contextualized word representations |
ACL |
2018 |
paper |
AllenAI |
LongFormer |
Longformer: The Long-Document Transformer |
arxiv |
20.04.10 |
paper |
GoogleAI |
BERT |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding |
ACL |
2018 |
paper |
GoogleAI |
ALBERT |
ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS |
ICLR |
19.09.26 |
paper |
GoogleAI |
T5 |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer |
JMLR |
19.10.23 |
paper |
GoogleAI |
PEGASUS |
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization |
ICML |
2020 |
paper |
GoogleAI |
ELECTRA |
ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS |
ICLR |
2020 |
paper |
DeepMind |
Compressive Transformers |
COMPRESSIVE TRANSFORMERS FOR LONG-RANGE SEQUENCE MODELLING |
arxiv |
19.11.13 |
paper |
UNC Chapel Hill |
LXMERT |
LXMERT: Learning Cross-Modality Encoder Representations from Transformers |
arxiv |
19.08.20 |
paper |
OpenAI |
GPT-1 |
Improving language understanding with unsupervised learning |
OpenAI |
2018 |
paper |
OpenAI |
GPT-2 |
Language Models are Unsupervised Multitask Learners |
OpenAI |
2019 |
paper |
OpenAI |
GPT-3 |
Language Models are Few-Shot Learners |
OpenAI |
2020 |
paper |
FAIR |
FastText |
Advances in Pre-Training Distributed Word Representations |
arxiv |
17.12.26 |
paper |
FAIR |
XLM |
Cross-lingual Language Model Pretraining |
arxiv |
19.01.22 |
paper |
FAIR |
FSMT |
Facebook FAIR's WMT19 News Translation Task Submission |
arxiv |
19.07.15 |
paper |
FAIR |
RoBERTa |
RoBERTa: A Robustly Optimized BERT Pretraining Approach |
arxiv |
19.07.26 |
paper |
FAIR |
MMBT |
Supervised Multimodal Bitransformers for Classifying Images and Text |
arxiv |
19.09.06 |
paper |
FAIR |
BART |
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension |
arxiv |
19.10.29 |
paper |
FAIR |
CamemBERT |
CamemBERT: a Tasty French Language Model |
arxiv |
19.11.10 |
paper |
FAIR |
mBART |
Multilingual Denoising Pre-training for Neural Machine Translation |
arxiv |
20.01.22 |
paper |
FAIR |
RAG |
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks |
arxiv |
20.05.22 |
paper |
Hugging Face |
DistilBERT |
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter |
arxiv |
19.10.02 |
paper |
Microsoft |
Marian |
Marian: Cost-effective High-Quality Neural Machine Translation in C++ |
ACL |
2018 |
paper |
Microsoft |
MT-DNN |
Multi-Task Deep Neural Networks for Natural Language Understanding |
arxiv |
19.05.30 |
paper |
Microsoft |
LayoutLM |
LayoutLM: Pre-training of Text and Layout for Document Image Understanding |
arxiv |
19.12.31 |
paper |
NVIDIA |
MegatronLM |
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism |
arxiv |
19.09.17 |
paper |
Univ. of Washington |
Grover-Mega |
Defending Against Neural Fake News |
arxiv |
19.10.29 |
paper |
Carnegie Mellon GoogleBrain |
Transformer-XL |
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context |
arxiv |
19.06.02 |
paper |
Carnegie Mellon GoogleBrain |
XLNet |
XLNet: Generalized Autoregressive Pretraining for Language Understanding |
arxiv |
19.06.19 |
paper |
Carnegie Mellon GoogleBrain |
Funnel |
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing |
arxiv |
20.06.05 |
paper |
Salesforce |
CTRL |
CTRL: A CONDITIONAL TRANSFORMER LANGUAGE MODEL FOR CONTROLLABLE GENERATION |
arxiv |
19.09.11 |
paper |
Anonymous authors |
MobileBERT |
MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer |
ICLR |
2020 |
paper |
:sparkles: Attention Mechanism
-
[x] Bahdanau Attention; NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE (ICLR 2015) paper
-
[x] Multi-Head Attention; Attention Is All You Needs (NIPS 2017) paper
-
[ ] Google Research-Synthesizer; SYNTHESIZER: Rethinking Self-Attention in Transformer Models (20.05.02, arxiv) paper
:massage: Conversational AI
Memory-Based Research
Sumit Chopra
, Jason Weston
님 연구 추적
- [x] Memory Networks (14.10.15, arxiv; ICLR 2015) paper
- [x] End-To-End Memory Networks (NIPS 2015) paper
- [ ] Learning Through Dialogue Interactions By Asking Questions (16.12.15, ICLR 2017) paper
Open-Domain
- Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index, ACL
- Kelvin Guu의 REALM, ACL
- [ ] DPR; Dense Passage Retrieval for Open-Domain Question Answering (20.04.10) paper
:art: Generative Model
GAN
- [ ] Original GAN; Generative Adversarial Net (NIPS 2014) paper
:monkey_face: Meta Learning
- [ ] MAML; Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (ICML 2017) paper
Curiosity Algorithms
- https://ai.googleblog.com/2018/10/curiosity-and-procrastination-in.html
- [ ] Meta-leraning curiosity algorithms
- [ ] Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
- [ ] Novelty search (Lehman & Stanley, 2008)
- [ ] Buffers and Nearest Neighbors (Fu et al., 2017)
- [ ] Generating goals (Srivastava et al., 2013; Kulkarni et al., 2016)
- [ ] Learning progress (Oudeyer et al., 2007; Schmidhuber, 2008)
- [ ] Generating diverse skills (Eysenbach et al., 2018)
- [ ] Stochastic neural networks (Florensa et al., 2017; Fortunato et al., 2017)
- [ ] Count-based exploration (Tang et al., 2017)
- [ ] Object-based curiosity measures (Forestier & Oudeyer, 2016)
- [ ] Bonus-based (Taiga et al., 2019)
Road to General Intelligence
- AutoML Style Approach
- Neural Architecture Search (NAS)
- Hyperparameter optimization for deep networks
- Auto-sklearn, Learning loss funtions to replace cross-entropy for training a fixed architecture on MNIST and CIFAR
- Meta-learning with genetic programming, evolutionary computing
- Programming Automation
- Searching over mathematical operations within neural networks
- Neural networks that learn programs
- Modular Meta-Learning / Hierarchical Meta-Learning, Reinforcement Learning
- Inspired from Cognitive/Brain Science (Attention, Curiosity, Common Sense, etc)
- Agent57 (DeepMind)
:brain: Reinforcement Learning
- [x] Policy Gradient Theorem Policy Gradient Methods for Reinforcement Learning with Function Approximation (NIPS 2000) paper
- [ ] Deterministic Policy Gradient Algorithm
- [ ] Continuous Control with Deep Reinforcement Learning
- [ ] Approximetely Optimal Approximate Reinforcement Learning
- [ ] True Region Policy Optimization
- [ ] Proximal Policy Optimization Algorithms
RL.start() 오늘의 논문 series
- [ ] ACCELERATED METHODS FOR DEEP REINFORCEMENT LEARNING () paper
- [ ] Implementation Matters In Deep RL () paper
- [ ] CURL: Contrastive Unsupervised Representations for Reinforcement Learning () paper
- [ ] Dream to Control: Learning Behaviors by Latent Imagination () paper
:chart_with_upwards_trend: Financial Mathematics & Engineer
:art: Neuromorphic
:cat2: Theoretical Deep Learning
- [x] Neural Network Ensembles, Cross Validation, and Active Learning (NIPS 1995) paper
Batch Normalization
Lipschitz gradient
Global Batch Normalization
Input Covariate Shift
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
How Does Batch Normalization Help Optimization?
Layer Normalization https://arxiv.org/abs/1607.06450
LeCun Initialization Efficient BackProp
Xavier initialization Understanding the difficulty of training deep feedforward neural networks
He Initialization Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Nesterov Optimizer (Optimization류 논문들)
weight_standardization
:heart_eyes: Schmidhuber
Juergen Schmidhuber's Google Scholar
- [x] Long short-term memory (Neural Computation 1997) paper
- [ ] LSTM: A Search Space Odyssey (IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017) paper
- [x] Highway Networks (15.05.03, arxiv) paper
- Full Paper: Training Very Deep Networks link
- [x] Recurrent Highway Networks (ICML 2017) paper
- [ ] Gradient flow in recurrent nets: the difficulty of learning long-term dependencies (IEEE 2001) paper paper
- [ ] Bidirectional LSTM networks for improved phoneme classification and recognition (International Conference on Artificial Neural Networks 05.09.11)
- [ ] Sequential neural text compression (IEEE 1996) paper
- [ ] Neural expectation maximazation (NIPS 2017) paper
- [ ] Accelerated Neural Evolution through Cooperatively Coevolved Synapses (JMLR 2008) paper
- [ ] World Models (18.05.09, arxiv) paper
ETC
LSTM-SAE Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems
C3D Learning Spatiotemporal Features with 3D Convolutional Networks
n-gram 관련 논문
- Estimation of Probabilities from Sparse Data for the
Language Model Component of a Speech Recognizer
- Interpolated estimation of Markov source parameters from sparse data
Pointing the Unknown Words (몬트리홀 대학)
Seq2Seq Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation
Real-World Anomaly Detection in Surveillance Videos
self-attention on classification - A Structured Self-Attentive Sentence Embedding