hyunsu-yang commented 3 years ago

강의

https://www.edwith.org/boostcourse-dl-pytorch/joinLectures/24018

Lab-11-0 RNN intro

핵심

순환 신경망(Recurrent Neural Network, RNN)
- 입력과 출력을 시퀀스 단위로 처리하는 모델
- 메모리 셀이 출력층 방향으로 또는 다음 시점 t+1의 자신에게 보내는 값을 은닉 상태(hidden state)
- 모든 셀이 전부 파라미터를 공유
장단기 메모리(Long Short-Term Memory, LSTM)
- 바닐라 RNN의 시점(time step)이 길어질 수록 앞의 정보가 뒤로 충분히 전달되지 못하는 현상이 발생
- 바닐라 RNN은 비교적 짧은 시퀀스(sequence)에 대해서만 효과를 보이는 단점 보완

hyunsu-yang commented 3 years ago

Lab-11-1 RNN basics

핵심

RNN 구현


# sequential example
# shape : (3, 5, 4)
h = [1, 0, 0, 0]
e = [0, 1, 0, 0]
l = [0, 0, 1, 0]
o = [0, 0, 0, 1]
input_data_np = np.array([[h, e, l, l, o], [e, o, l, l, l], [l, l, e, e, l]], dtype=np.float32)

# transform as torch tensor
input_data = torch.Tensor(input_data_np)

# declare dimension
input_size = 4
hidden_size = 2

# declare RNN
rnn = torch.nn.RNN(input_size, hidden_size)

# check output
outputs, _status = rnn(input_data)

Hidden State
- hidden_size는 out shape를 결정한다.
- 왜 ? 구조상 같은 값이 갈라지는 것이기 때문
Input, Output Data 구성
- output_data = (batch_size, sequence_length, hidden_size)
- input_data = (batch_size, sequence_length, input_size)

hyunsu-yang commented 3 years ago

Lab-11-2 RNN hihello and charseq

핵심

one hot encoding 활용한 input data 캐릭터 표현

basic code (RNN hihello and charseq)


import torch
import torch.optim as optim
import numpy as np

Random seed to make results deterministic and reproducible

torch.manual_seed(0)

char_set = ['h', 'i', 'e', 'l', 'o']

hyper parameters

input_size = len(char_set) hidden_size = len(char_set) learning_rate = 0.01

data setting

x_data = [[0, 1, 0, 2, 3, 3]] x_one_hot = [[[1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [1, 0, 0, 0, 0], [0, 0, 1, 0, 0], [0, 0, 0, 1, 0], [0, 0, 0, 1, 0]]] y_data = [[1, 0, 2, 3, 3, 4]]

transform as torch tensor variable

X = torch.FloatTensor(x_one_hot) Y = torch.LongTensor(y_data)

declare RNN

rnn = torch.nn.RNN(input_size, hidden_size, batch_first=True) # batch_first guarantees the order of output = (B, S, F)

loss & optimizer setting

criterion = torch.nn.CrossEntropyLoss() optimizer = optim.Adam(rnn.parameters(), learning_rate)

start training

for i in range(1000): optimizer.zero_grad() outputs, _status = rnn(X) loss = criterion(outputs.view(-1, input_size), Y.view(-1)) loss.backward() optimizer.step()

result = outputs.data.numpy().argmax(axis=2)
result_str = ''.join([char_set[c] for c in np.squeeze(result)])
print(i, "loss: ", loss.item(), "prediction: ", result, "true Y: ", y_data, "prediction str: ", result_str)

hyunsu-yang commented 3 years ago

번외

바닐라 RNN
- RNN을 가장 단순한 형태의 RNN이라고 하여 바닐라 RNN(Vanilla RNN)
- 바닐라 RNN의 시점(time step)이 길어질 수록 앞의 정보가 뒤로 충분히 전달되지 못하는 현상이 발생
- 이를 장기 의존성 문제(the problem of Long-Term Dependencies)라고 함.
장단기 메모리(Long Short-Term Memory, LSTM)
- 전통적인 RNN의 이러한 단점을 보완한 RNN의 일종을 장단기 메모리(Long Short-Term Memory)라고 하며 줄여서 LSTM
- LSTM은 은닉층의 메모리 셀에 입력 게이트, 망각 게이트, 출력 게이트를 추가하여 불필요한 기억을 지우고, 기억해야할 것들을 정합니다.
- 요약하면 LSTM은 은닉 상태(hidden state)를 계산하는 식이 전통적인 RNN보다 조금 더 복잡해졌으며 셀 상태(cell state)라는 값을 추가하였습니다. 위의 그림에서는 t시점의 셀 상태를 Ct로 표현하고 있습니다. LSTM은 RNN과 비교하여 긴 시퀀스의 입력을 처리하는데 탁월한 성능
usage nn.LSTM(input_dim, hidden_size, batch_fisrt=True)

hyunsu-yang / PyTorch

pytorch study 10주차 #12

강의

목차

Lab-11-0 RNN intro

핵심

Lab-11-1 RNN basics

핵심

Lab-11-2 RNN hihello and charseq

핵심

Random seed to make results deterministic and reproducible

hyper parameters

data setting

transform as torch tensor variable

declare RNN

loss & optimizer setting

start training

번외