2023/11/27 ~ 2023/12/07

danbi5228 commented 9 months ago

12/7 목요일 pm 9:30 1: 16.1.4, 16.1.5 2: 16.1.6 3: 16.1.7

danbi5228 commented 9 months ago

assign roles -s 1127 -c 1 2 3

njs03332 commented 9 months ago

	0	1	2
member	한단비	김유리	주선미
chapter	1	2	3

njs03332 commented 9 months ago

16.1.6 가짜 셰익스피어 텍스트를 생성하기

초기 텍스트를 주입하고 Char-RNN 모델이 가장 가능성 있는 다음 글자를 예측하는 방법 -> 같은 단어가 계속 반복되는 경우가 많음
tf.random.categorical() 함수를 사용해 모델이 추정한 확률을 기반으로 다음 글자를 무작위로 선택하는 방법 -> 다채로운 텍스트 생성 가능
- categorical() 함수는 클래스의 로그 확률 (로짓) 을 전달하면 랜덤하게 클래스 인덱스를 샘플링함
- 온도 (temperature)라 불리는 숫자를 설정 -> 이 값으로 로짓을 나누어 생성된 텍스트의 다양성을 제어할 수 있음
- 온도가 높을수록 모든 글자가 동일한 확률을 가짐

# 다음 글자를 선택하고 입력 텍스트에 추가
def next_char(text, temperature=1):
  X_new = preprocess([text])
  y_proba = model(X_new)[0. -1:, :]
  rescaled_logits = tf.math.log(y_proba) / temperature
  char_id = tf.random.categorical(rescaled_logits, num_samples=1) + 1
  return tokenizer.seqeunces_to_texts(char_id.numpy())[0]

# 위 함수를 반복 호출하여 다음 글자를 얻고 텍스트에 추가
def complete_text(text, n_chars=50, temperature=1):
  for _ in range(n_chars):
    text += next_char(text, temperature)
    return text

온도를 다르게 하며 테스트해봤을 때, 1에 가까운 온도에서 가장 잘 작동함
더 좋은 텍스트 생성하기
- GRU 층과 층의 뉴런 수를 더 늘리고 더 오래 훈련
- 규제를 추가
현재 글자 100개인 n_steps보다 긴 패턴을 학습할 수 없음
- 매우 긴 시퀀스를 다루기 위해서는 상태가 있는 RNN을 사용

danbi5228 commented 9 months ago

16.1.4 Char-RNN 모델 만들고 훈련하기

model = keras.models.Sequential([
    keras.layers.GRU(128, return_sequences=True, input_shape=[None, max_id],
                                  dropout=0.2, recurrent_dropout=0.2),
    keras.layers.GRU(128, return_sequences=True,
                                  dropout=0.2, recurrent_dropout=0.2),
    keras.layers.TimeDistributed(keras.layers.Dense(max_id, activation="softmax"))
])
model.compile(loss="sparse_categorical_crossentropy", optimizer="adam")
history = model.fit(dataset, epochs=20)

이전 글자 100개를 기반으로 다음 글자를 예측하기 위해 유닛 128개를 가진 GRU 층 2개와 입력과 은닉상태에 20% 드롭아웃 적용
- 필요시 하이퍼파라미터 수정해서 사용
텍스트 고유 글자 수가 39개 (max_id) 이므로 유닛 갯수도 39
각 타임 스텝에서 출력 확률의 합은 1이어야 하므로 Dense 층 출력에 softmax 적용

16.1.5 Char-RNN 모델 사용하기


# 텍스트 주입 전 전처리
def preprocess(texts):
    X = np.array(tokenizer.texts_to_sequences(texts)) - 1
    return tf.one_hot(X, max_id)

# model 을 사용한 텍스트 다음 글자 예측하기
X_new = preprocess(["How are yo"])
Y_pred = np.argmax(model(X_new), axis=-1)
tokenizer.sequences_to_texts(Y_pred+1)[0][-1] # ID가 1부터 시작하므로 +1
### output: 'u'  ==>  첫번째 문장의 마지막 글자

givitallugot commented 9 months ago

16.1.7 상태가 있는 RNN

지금까지는 상태가 없는 RNN만 사용 (훈련 반복마다 모델의 은닉 상태를 0으로 초기화)
RNN이 하나의 훈련 배치를 처리한 후에 마지막 상태를 다음 훈련 배치의 초기 상태로 사용하면, 모델이 장기간 패턴을 학습할 수 있음 => 상태가 있는 RNN
상태가 있는 RNN은 배치에 있는 각 입력 시퀀스가 이전 배치의 시퀀스가 끝난 지점에서 시작해야 함

(데이터) 상태가 있는 RNN을 만들기 위해서,

순차적이고 겹치지 않는 입력 시퀀스를 만드는 것이 중요
이를 위해 Dataset()을 만들 때 window() 메서드에서 shift=n_steps를 사용해야 하고, shuffle()을 호출해서는 안 됨

첫 번째 배치는 윈도 1-32를 포함, 두 번째 배치는 윈도 33-64를 포함

dataset = tf.data.Dataset.from_tensor_slices(encoded[:train_size])
dataset = dataset.window(window_length, shift=n_steps, drop_remainder=True)
dataset = dataset.flat_map(lambda window: window.batch(window_length))
dataset = dataset.batch(1)
dataset = dataset.map(lambda windows: (windows[:, :-1], windows[:, 1:]))
dataset = dataset.map(lambda X_batch, Y_batch: (tf.one_hot(X_batch, depth=max_id), Y_batch))
dataset = dataset.prefetch(1)

(모델) 그리고 각 순환 층을 만들 때

stateful=True로 지정해야 함

배치 크기를 알아야 하고 이를 batch_input_shape 매개변수로 지정해야 함

model = keras.models.Sequential([
keras.layers.GRU(128, return_sequences=True, stateful=True,
             #dropout=0.2, recurrent_dropout=0.2,
             dropout=0.2,
             batch_input_shape=[batch_size, None, max_id]),
keras.layers.GRU(128, return_sequences=True, stateful=True,
             #dropout=0.2, recurrent_dropout=0.2),
             dropout=0.2),
keras.layers.TimeDistributed(keras.layers.Dense(max_id,
                                            activation="softmax"))
])

njs03332 / ml_study

2023/11/27 ~ 2023/12/07 #77

16.1.6 가짜 셰익스피어 텍스트를 생성하기

16.1.4 Char-RNN 모델 만들고 훈련하기

16.1.5 Char-RNN 모델 사용하기

16.1.7 상태가 있는 RNN