Closed jgmorenof closed 4 years ago
Dear Prof. @jgmorenof, Before evaluating the XLNetNER, I still got the issue with memory.
Traceback (most recent call last):
File "train.py", line 164, in <module>
train(model, train_iter, optimizer, criterion)
...
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 936, in dropout
else _VF.dropout(input, p, training))
RuntimeError: CUDA out of memory. Tried to allocate 68.00 MiB (GPU 0; 11.17 GiB total capacity; 10.01 GiB already allocated; 58.81 MiB free; 10.74 GiB reserved in total by PyTorch)
It seems like increasing RAM of Collab would not help me when I tried to save the checkpoints of XLNet (Last time I cared about the performance and results so I have not saved them yet). My PC is only able to train and save models with batch size of 8. I wonder if I am eligible to access server of the laboratory now, please? Thank you very much!
Access to servers is now granted.
@nsidere Dear Professors, As I use the newest version of Pytorch, it seems like it does not support NVIDIA version that server currently have.
The NVIDIA driver on your system is too old (found version 9010).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.
So I think about 2 solutions:
I wonder if is it OK to upgrade CUDA since it may affect other users as well. Thank you very much!
Dear Hanh, Unfortunately, I am afraid I cannot answer to that. For technical issues, you should send an email to the mailing list (address in email sent by Muzzamil, something like L3i-calcul@....). Superusers of the servers are within this list or maybe some users have a solution.
Thank you
@nsidere @jgmorenof It seems like updating NVIDA is out of option (I have asked Prof. Muzzamil). Is it ok for me to revert to the older version of Pytorch, professors? As you mentioned in the previous previous meeting to use the latest one.
Dear Hanh, do you know which version of CUDA you need ? I understood that you have access to one server. Maybe the good version is installed on another server and we could ask an access to these servers.
Dear Prof. @nsidere , Currently I can only access to the server with CUDA 9.0
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
However, on the Pytorch official website, the newest version seems to require at least CUDA 9.2 (via this link)
I wonder if it is possible for me to access another server with suitable CUDA.
Thank you very much for your support.
Please evaluate the performances of XLNetNER using the corrected version of get_score.py
There are some differences between the integrated evaluator in XLNetNER and get_score.py. The latter is preferred as it uses the official evaluator script.