naver-airush / NAVER-AI-RUSH

41 stars 20 forks source link

기존 학습했던 모델을 nsml.load()를 통해서 불러오려 하는데 에러가 납니다 #35

Closed IllgamhoDuck closed 4 years ago

IllgamhoDuck commented 4 years ago

Informations

CLI

WEB

What is your login ID? IllgamhoDuck

What is name of session in problem? (bug message or screenshot) IllgamhoDuck/spam-1/4

Steps to reproduce the problem https://n-clair.github.io/ai-docs/_build/html/ko_KR/contents/load_model.html 위에 나온 절차를 따라서 다음 BasicModel에 fit시 기존 세션의 모델을 로드하는 3줄을 추가하였습니다.

 def fit(self, epochs_finetune, epochs_full, batch_size, debug=False):
 27         self.debug = debug
 28         self.data.prepare()
 29         self.network.compile(
 30             loss=self.loss(),
 31             optimizer=self.optimizer('finetune'),
 32             metrics=self.fit_metrics()
 33         )
 34         nsml.load(checkpoint='best', session='IllgamhoDuck/spam-1/4')
 35         nsml.save('saved')
 36         exit()

이럴 경우 다음처럼 해당 파일을 찾지 못한다고 합니다. 어떻게해야 하나요?


 nsml.load(checkpoint='best', session='IllgamhoDuck/spam-1/4')
  File "/app/nsml/client.py", line 478, in load
    load_fn(temp_dir)
  File "/app/spam/spam_classifier/models/BasicModel.py", line 161, in load
    model.network.load_weights(f'{dirname}/model')
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/saving.py", line 492, in load_wrapper
    return load_function(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/network.py", line 1221, in load_weights
    with h5py.File(filepath, mode='r') as f:
  File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 269, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/usr/local/lib/python3.6/dist-packages/h5py/_hl/files.py", line 99, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name = '/tmp/tmpxxubfrl5/model', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)
User session exited
nsml-admin commented 4 years ago

기존에 학습한 모델은 tf2.0으로 훈련이 진행됐던 것이고, 새로운 세션에서 load를 하려는건 keras라서 문제가 됐던거 같습니다.

IllgamhoDuck commented 4 years ago

@nsml-admin 감사합니다! 그 부분을 놓치고 있었네요!