Awesome-Papers

:question: Objective of `jinmang2/Awesome-Papers` Repo.

:bulb: To be AI Researcher, Artist and Good Person...!!

2021 Papers to Read

Learning to Learn without Gradient Descent by Gradient Descent
Massively Multitask Networks for Drug Discovery
One-Shot Imitation Learning
Few-Shot Autoregressive Density Estimation: Towards Learning to Learn Distributions
Meta-Learning for Low-Resource Neural Machine Translation
Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics
SYNTHESIZER: Rethinking Self-Attention in Transformer Models
Fine-tune BERT for Extractive Summarization
ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations
Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation

2020 Reading Papers

대충 쓱 본 논문은 기재하지 않음
전체 논문을 다 읽고 나 스스로 다른 정보까지 찾아본 논문들만 기재
예를 들어, word2vec같은 경우 개념은 알고 있지만 paper로 뜯어보진 않았기 때문에 기재하지 않음

Reinforcement Learning

Asynchronous Methods for Deep Reinforcement Learning
- A3C, DeepMind & Montreal
Continuous Control With Deep Reinforcement Learning
- DDPG, DQN+DPG, Replay Buffer, Soft-Update via Polyak Averaging, Ornstein Uhlenbeck process, White Gaussian Random process, DeepMind
Deterministic Policy Gradient Algorithms
- DeepMind, Policy Gradient, Actor-Critic, Deterministic Policy
Policy Gradient Methods for Reinforcement Learning with Function Approximation
- Compatible Function Approximation, Policy Gradient, Sutton
Approximately Optimal Approximate Reinforcement Learning
- Kakade & Langford, Mixture Policy, Policy Improvement
True Region Policy Optimiation
- Trust Region, Natural Policy, Kakade & Langford Thm, Policy Improvement, OpenAI
Proximal Policy Optimization Algorithms
- OpenAI, Practical TRPO, Clip Gradient

Meta-Learning

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
- MAML, Optimization-Based Meta-Learning

NLP

Efficient Estimation of Word Representations in Vector Space
- Word2Vec, CBOW, Skip-Gram
Distributed Representations of Words and Phrases and their Compositionality
- Enhanced vec repr quality, SubSampling, Negative Sampling, Hierarchical Softmax
Deep contextualized word representations
- ELMo, Feature-Based, Pre-ELMo + Linear Combination, SubWord Information by ConvNet
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Transformer's Encoder, MLM, NSP
Neural Machine Translatoin By Jointly Learning to Align and Translate
- GRU, Seq2Seq with Attention, Bahnau Attention
Attention Is All You Need
- Transformers, Self-Dot Product Attention, Seq2Seq
Advances in Pre-Training Distributed Word Representations
- FastText
Enriching Word Vectors with Subword Information
- FastText
Minimum Risk Training for Neural Machine Translation
- MRT, NMT
Bag of Tricks for Efficient Text Classification
- FastText for Text Classification, Fast!
A Fast and Accurate Dependency Parsing using Neural Networks
- Parsing
MaltParser: A Data-Driven Parser-Generator for Dependency Parsing
- Parsing
Incrementality in Deterministic Dependency Parsing
- Parsing
A Neural Probabilistic Language Model
- NPLM
Universal Language Model Fine-tuning for Text Classification
- ULMFit, Fine-Tuning
The Natural Language Decathlon: Multitask Learning as Question Answering
- MultiTask Learning, anti-curriculum learning
Phrase-Based & Neural Unsupervised Machine Translation
- Initialization, `,Back-Translation`
A Structured Self-Attentive Sentence Embedding
- Self-Attentive

Graph

Graph Attention Networks
- GNN, Attention
MAGNET: Multi-Label Text Classfication using Attention-based Graph Neural Network
- GAT, MLTC

Conversational AI

Memory Networks
End-To-End Memory Networks
Learning Through Dialogue Interactions By Asking Questions
Hierarchical Attention Networks for Document Classification
Conversational Decision-Making Model for Predicting the King's Decision in the Annals of the Joseon Dynasty

Fundamental

Decoupled Neural Interfaces using Synthetic Gradients
Decoupled Weight Decay Regularization
Neural Network Ensembles, Cross Validation, and Active Learning
Sharp Minima Can Generalize For Deep Nets
Long short-term memory
Highway Networks
Recurrent Highway Networks

ETC

LSTM-SAE Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems
C3D Learning Spatiotemporal Features with 3D Convolutional Networks

:office: NLP

Tokenization

[x] BPE(Byte-Pair-Encoding); A New Algorithm for Data Compression (C-user journal 1994) paper
- In Wikipedia
[x] Adjust BPE on NMT; Neural Machine Translation of Rare Words with Subword Units (ACL 2016) paper
- Compare between n-gram and byte-pair-encoding

Wordpiece

SentencePiece

Morphological

Word Vector Representation

[x] NPLM; A Neural Probabilistic Language Model (jmlr 2003) paper
- NPLM's Reference -> 문장에서 단어의 역할을 학습
- [ ] Modeling High-Dimensional Discrete Data with Multi-Layer Neural Networks (NIPS 2000) paper
  - NN으로 고차원 이진 분산 표현을 실시하는 아이디어 제시
- [ ] Extracting distributed representations of concepts and relations from positive and negative propositions (IEEE 2000) link
  - Hinton 교수의 연구가 성공적으로 적용된 사례
- [ ] Natural Language Processing With Modular Pdp Networks and Distributed Lexicon (Cognitive Science 1991 July) link
  - Neural network를 LM에 적용시키려 한 사례
- NPLM's Reference -> word sequence distribution의 statistical model을 학습
- [ ] Sequential neural text compression (IEEE 1996) link
  - I Love Schmidhuber a lot :)
[x] Word2Vec 2013a; Efficient Estimation of Word Representations in Vector Space (ICLR 2013) paper
- Introduce Skip-Gram & CBOW
- Google Team
[x] Word2Vec 2013b; Distributed Representations of Words and Phrases and their Compositionality (NIPS 2013) paper
- Propose train optimization method such as negative sampling
[ ] GloVe(Global Word Vectors); GloVe: Global Vectors for Word Representation (ACL 2014) paper
- Stanford Univ.
- Overcome Word2Vec and LSA
[ ] Swivel(Submatrix-Wise Vector Embedding Learner); Swivel: Improving Embeddings by Noticing What’s Missing () paper
- Google, source code
[x] FastText; Enriching Word Vectors with Subword Information (17.06.16, arxiv) paper

NLP Tasks

A large annotated corpus for learning natural language inference, Bowman et al., 2015 (EMNLP)

A board-coverage challenge corpus for sentence understanding through inference, Williams et al., 2018

SQuad: 100,000+ questions for machine comprehension of text, Rajpurkar et al., 2016

introduction to th conll-2003 shared task: language-independent named entity recognition, Tjong Kim Sang and De Meulder, 2003

Dependency Parsing

[ ] Incrementality in Deterministic Dependency Parsing (ACL, 2003) paper
[ ] MaltParser: A Data-Driven Parser-Generator for Dependency Parsing (LREC, 2005) paper
[ ] A Fast and Accurate Dependency Parser using Neural Network (EMNLP, 2014) paper

Neural Machine Translation

[ ] MRT(Minimum Risk Training); Minimum Risk Training for Neural Machine Translation (ACL 2016) paper

Text Classification

[x] FastText for classification; Bag of Tricks for Efficient Text Classification (ACL 2017) link
[ ] UNMFit; Universal Language Model Fine-tuning for Text Classification (18.05.23, arxiv) paper

Question Answering

Stochastic Answer Networks for Machine Reading Comprehension https://arxiv.org/abs/1712.03556

Textual Entailment

Enhanced LSTM for Natural Language Inference https://arxiv.org/abs/1609.06038

Semantic Role Labeling

Deep Semantic Role Labeling: What Works and What’s Next https://www.aclweb.org/anthology/P17-1044/

Summarization

Extractive

[ ] BertSum; Fine-tune BERT for Extractive Summarization (19.03.25, arxiv) paper

[ ] BertSum-Full Paper; Text Summarization with Pretrained Encoders (19.08.22, arxiv) paper

Pre-trained NLP Architecture

[ ] Semi-supervised sequence learning (NIPS 2015) paper

Word Representations: A Simple and General Method for Semi-Supervised Learning

institute	subtitle	title	journal	published	etc
AllenAI	ELMo	Deep contextualized word representations	ACL	2018	paper
AllenAI	LongFormer	Longformer: The Long-Document Transformer	arxiv	20.04.10	paper
GoogleAI	BERT	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	ACL	2018	paper
GoogleAI	ALBERT	ALBERT: A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS	ICLR	19.09.26	paper
GoogleAI	T5	Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	JMLR	19.10.23	paper
GoogleAI	PEGASUS	PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization	ICML	2020	paper
GoogleAI	ELECTRA	ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS	ICLR	2020	paper
DeepMind	Compressive Transformers	COMPRESSIVE TRANSFORMERS FOR LONG-RANGE SEQUENCE MODELLING	arxiv	19.11.13	paper
UNC Chapel Hill	LXMERT	LXMERT: Learning Cross-Modality Encoder Representations from Transformers	arxiv	19.08.20	paper
OpenAI	GPT-1	Improving language understanding with unsupervised learning	OpenAI	2018	paper
OpenAI	GPT-2	Language Models are Unsupervised Multitask Learners	OpenAI	2019	paper
OpenAI	GPT-3	Language Models are Few-Shot Learners	OpenAI	2020	paper
FAIR	FastText	Advances in Pre-Training Distributed Word Representations	arxiv	17.12.26	paper
FAIR	XLM	Cross-lingual Language Model Pretraining	arxiv	19.01.22	paper
FAIR	FSMT	Facebook FAIR's WMT19 News Translation Task Submission	arxiv	19.07.15	paper
FAIR	RoBERTa	RoBERTa: A Robustly Optimized BERT Pretraining Approach	arxiv	19.07.26	paper
FAIR	MMBT	Supervised Multimodal Bitransformers for Classifying Images and Text	arxiv	19.09.06	paper
FAIR	BART	BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension	arxiv	19.10.29	paper
FAIR	CamemBERT	CamemBERT: a Tasty French Language Model	arxiv	19.11.10	paper
FAIR	mBART	Multilingual Denoising Pre-training for Neural Machine Translation	arxiv	20.01.22	paper
FAIR	RAG	Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks	arxiv	20.05.22	paper
Hugging Face	DistilBERT	DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter	arxiv	19.10.02	paper
Microsoft	Marian	Marian: Cost-effective High-Quality Neural Machine Translation in C++	ACL	2018	paper
Microsoft	MT-DNN	Multi-Task Deep Neural Networks for Natural Language Understanding	arxiv	19.05.30	paper
Microsoft	LayoutLM	LayoutLM: Pre-training of Text and Layout for Document Image Understanding	arxiv	19.12.31	paper
NVIDIA	MegatronLM	Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism	arxiv	19.09.17	paper
Univ. of Washington	Grover-Mega	Defending Against Neural Fake News	arxiv	19.10.29	paper
Carnegie Mellon GoogleBrain	Transformer-XL	Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context	arxiv	19.06.02	paper
Carnegie Mellon GoogleBrain	XLNet	XLNet: Generalized Autoregressive Pretraining for Language Understanding	arxiv	19.06.19	paper
Carnegie Mellon GoogleBrain	Funnel	Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing	arxiv	20.06.05	paper
Salesforce	CTRL	CTRL: A CONDITIONAL TRANSFORMER LANGUAGE MODEL FOR CONTROLLABLE GENERATION	arxiv	19.09.11	paper
Anonymous authors	MobileBERT	MobileBERT: Task-Agnostic Compression of BERT by Progressive Knowledge Transfer	ICLR	2020	paper

:sparkles: Attention Mechanism

[x] Bahdanau Attention; NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE (ICLR 2015) paper
[x] Multi-Head Attention; Attention Is All You Needs (NIPS 2017) paper
[ ] Google Research-Synthesizer; SYNTHESIZER: Rethinking Self-Attention in Transformer Models (20.05.02, arxiv) paper

:massage: Conversational AI

Memory-Based Research

Sumit Chopra, Jason Weston님 연구 추적
[x] Memory Networks (14.10.15, arxiv; ICLR 2015) paper
[x] End-To-End Memory Networks (NIPS 2015) paper
[ ] Learning Through Dialogue Interactions By Asking Questions (16.12.15, ICLR 2017) paper

Open-Domain

Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index, ACL
Kelvin Guu의 REALM, ACL
[ ] DPR; Dense Passage Retrieval for Open-Domain Question Answering (20.04.10) paper
- Huffon님 소개자료

:art: Generative Model

GAN

[ ] Original GAN; Generative Adversarial Net (NIPS 2014) paper

:monkey_face: Meta Learning

[ ] MAML; Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks (ICML 2017) paper

Curiosity Algorithms

https://ai.googleblog.com/2018/10/curiosity-and-procrastination-in.html
[ ] Meta-leraning curiosity algorithms
[ ] Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML
[ ] Novelty search (Lehman & Stanley, 2008)
[ ] Buffers and Nearest Neighbors (Fu et al., 2017)
[ ] Generating goals (Srivastava et al., 2013; Kulkarni et al., 2016)
[ ] Learning progress (Oudeyer et al., 2007; Schmidhuber, 2008)
[ ] Generating diverse skills (Eysenbach et al., 2018)
[ ] Stochastic neural networks (Florensa et al., 2017; Fortunato et al., 2017)
[ ] Count-based exploration (Tang et al., 2017)
[ ] Object-based curiosity measures (Forestier & Oudeyer, 2016)
[ ] Bonus-based (Taiga et al., 2019)

Road to General Intelligence

AutoML Style Approach
- Neural Architecture Search (NAS)
- Hyperparameter optimization for deep networks
- Auto-sklearn, Learning loss funtions to replace cross-entropy for training a fixed architecture on MNIST and CIFAR
Meta-learning with genetic programming, evolutionary computing
Programming Automation
- Searching over mathematical operations within neural networks
- Neural networks that learn programs
Modular Meta-Learning / Hierarchical Meta-Learning, Reinforcement Learning
Inspired from Cognitive/Brain Science (Attention, Curiosity, Common Sense, etc)
Agent57 (DeepMind)

:brain: Reinforcement Learning

[x] Policy Gradient Theorem Policy Gradient Methods for Reinforcement Learning with Function Approximation (NIPS 2000) paper
[ ] Deterministic Policy Gradient Algorithm
[ ] Continuous Control with Deep Reinforcement Learning
[ ] Approximetely Optimal Approximate Reinforcement Learning
[ ] True Region Policy Optimization
[ ] Proximal Policy Optimization Algorithms

RL.start() 오늘의 논문 series

[ ] ACCELERATED METHODS FOR DEEP REINFORCEMENT LEARNING () paper
[ ] Implementation Matters In Deep RL () paper
[ ] CURL: Contrastive Unsupervised Representations for Reinforcement Learning () paper
[ ] Dream to Control: Learning Behaviors by Latent Imagination () paper

:chart_with_upwards_trend: Financial Mathematics & Engineer

:art: Neuromorphic

:cat2: Theoretical Deep Learning

[x] Neural Network Ensembles, Cross Validation, and Active Learning (NIPS 1995) paper

Batch Normalization

Lipschitz gradient

Global Batch Normalization

Input Covariate Shift

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

How Does Batch Normalization Help Optimization?

Layer Normalization https://arxiv.org/abs/1607.06450

LeCun Initialization Efficient BackProp

Xavier initialization Understanding the difficulty of training deep feedforward neural networks

He Initialization Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Nesterov Optimizer (Optimization류 논문들)

weight_standardization

:heart_eyes: Schmidhuber

Juergen Schmidhuber's Google Scholar

[x] Long short-term memory (Neural Computation 1997) paper

[ ] LSTM: A Search Space Odyssey (IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017) paper

[x] Highway Networks (15.05.03, arxiv) paper

Full Paper: Training Very Deep Networks link

[x] Recurrent Highway Networks (ICML 2017) paper

[ ] Gradient flow in recurrent nets: the difficulty of learning long-term dependencies (IEEE 2001) paper paper

[ ] Bidirectional LSTM networks for improved phoneme classification and recognition (International Conference on Artificial Neural Networks 05.09.11)

[ ] Sequential neural text compression (IEEE 1996) paper

[ ] Neural expectation maximazation (NIPS 2017) paper

[ ] Accelerated Neural Evolution through Cooperatively Coevolved Synapses (JMLR 2008) paper

[ ] World Models (18.05.09, arxiv) paper

ETC

LSTM-SAE Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems

C3D Learning Spatiotemporal Features with 3D Convolutional Networks

n-gram 관련 논문

Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer
Interpolated estimation of Markov source parameters from sparse data

Pointing the Unknown Words (몬트리홀 대학)

Seq2Seq Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Real-World Anomaly Detection in Surveillance Videos

self-attention on classification - A Structured Self-Attentive Sentence Embedding

jinmang2 / Awesome-Papers

readme

Awesome-Papers

:question: Objective of `jinmang2/Awesome-Papers` Repo.

2021 Papers to Read

2020 Reading Papers

:office: NLP

Tokenization

Word Vector Representation

NLP Tasks

Dependency Parsing

Neural Machine Translation

Text Classification

Question Answering

Textual Entailment

Semantic Role Labeling

Summarization

Pre-trained NLP Architecture

:sparkles: Attention Mechanism

:massage: Conversational AI

Memory-Based Research

Open-Domain

:art: Generative Model

GAN

:monkey_face: Meta Learning

Curiosity Algorithms

Road to General Intelligence

:brain: Reinforcement Learning

RL.start() 오늘의 논문 series

:chart_with_upwards_trend: Financial Mathematics & Engineer

:art: Neuromorphic

:cat2: Theoretical Deep Learning

:heart_eyes: Schmidhuber

ETC

jinmang2 / Awesome-Papers

readme

Awesome-Papers

:question: Objective of jinmang2/Awesome-Papers Repo.

2021 Papers to Read

2020 Reading Papers

:office: NLP

Tokenization

Word Vector Representation

NLP Tasks

Dependency Parsing

Neural Machine Translation

Text Classification

Question Answering

Textual Entailment

Semantic Role Labeling

Summarization

Pre-trained NLP Architecture

:sparkles: Attention Mechanism

:massage: Conversational AI

Memory-Based Research

Open-Domain

:art: Generative Model

GAN

:monkey_face: Meta Learning

Curiosity Algorithms

Road to General Intelligence

:brain: Reinforcement Learning

RL.start() 오늘의 논문 series

:chart_with_upwards_trend: Financial Mathematics & Engineer

:art: Neuromorphic

:cat2: Theoretical Deep Learning

:heart_eyes: Schmidhuber

ETC

:question: Objective of `jinmang2/Awesome-Papers` Repo.