n-waves / multifit

The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model Fine-tuning" https://arxiv.org/abs/1909.04761
MIT License
282 stars 56 forks source link

Kernel restarted #69

Open cahya-wirawan opened 4 years ago

cahya-wirawan commented 4 years ago

I have some other problems to run the notebook CLS-DE.ipynb. If I use conda and install the default pytorch (1.3.1), after the command

exp.finetune_lm.train_(cls_dataset, num_epochs=20)

I get following error message:

ImportError: /tmp/torch_extensions/forget_mult_cuda/forget_mult_cuda.so: undefined symbol: _ZN3c106Symbol14fromQualStringERKSs

Then I installed pytroch from the pytorch channel as follow:

conda install pytorch=1.3.1 torchvision cudatoolkit=10.0 -c pytorch

The issue with "undefined symbol" is gone, but the kernel was restarted during the first epoch of exp.finetune_lm.train_(cls_dataset, num_epochs=20)

Is this known problem? Following is maybe the relevan python modules:

$ conda list| egrep 'torch|^fastai|cuda|nvid'
_pytorch_select           0.2                       gpu_0  
cudatoolkit               10.0.130                      0  
cudnn                     7.6.5                cuda10.0_0  
fastai                    1.0.61                        1    fastai
nvidia-ml-py3             7.352.0                    py_0    fastai
pytorch                   1.3.1           cuda100py37h53c1284_0  
torchvision               0.4.2           cuda100py37hecfc37a_0  

Thanks.

cahya-wirawan commented 4 years ago

I fixed the kernel restarting after I use CUDA 9.2 instead of CUDA 10.0. It seems the model doesn't like the latest cuda version. Now the notebook runs properly to the end.