Open mocarsha opened 2 years ago
Can you please provide the full error?
I am having same issue while running train.py. Here's the full detailed error:
Load data...[DONE] 2.39ms Tokenize...[DONE] 29.36ms Build vocabulary...[DONE] 0.62ms Load BERT tokenizer...[DONE] 340.26ms Load BERT model...[DONE] 882.21ms Load index...[DONE] 69.50ms 2024-06-25 11:59:59.603955: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2024-06-25 11:59:59.604002: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2024-06-25 11:59:59.605299: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2024-06-25 11:59:59.611551: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2024-06-25 12:00:00.750669: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT No labml server url specified. Please start a labml server and specify the URL. Docs: https://github.com/labmlai/labml/tree/master/app
retro_small: 706e157632ea11ef989a0242ac1c000c
[clean]: "cleanup notebooks"
116: Train: 5% 88,760ms loss.train: 3.71168 88,760ms 0:00m/ 0:47m Traceback (most recent call last):
File "/content/annotated_deep_learning_paper_implementations/labml_nn/transformers/retro/train.py", line 225, in
Also, it says "No labml server url specified. Please start a labml server and specify the URL.". Do I need to create the server? is it required? can you explain please?
Hi,
Running the exact code on github for deepmind's retrieval transformer - RETRO, getting the following error:
RuntimeError: stack expects each tensor to be equal size, but got [2, 32] at entry 0 and [1, 32] at entry 29
Could you please help me with this? I used the same dataset as in the code.