Open 0x7o opened 2 years ago
For anyone else running into this (as I have) there's a (fairly obvious) workaround to hardcode use_gpu to False in index_embeddings within retrieval.py. I'll update if and when I come up with a proper fix, but this at least allowed me to progress (after burning a lot of CPU cycles)
How did you solve this problem? thanks @0x7o
This error occurs when trying to use TrainingWrapper. If the training data is 1 megabyte in total, no error occurs. On larger data this error appears.
Apparently the script is trying to process all the data at once, not in batches. Because of this there is a lack of system resources.
RAM: 12 gb VRAM: 12 gb
import torch from retro_pytorch import RETRO, TrainingWrapper retro = RETRO( chunk_size = 64, # the chunk size that is indexed and retrieved (needed for proper relative positions as well as causal chunked cross attention) max_seq_len = 2048, # max sequence length enc_dim = 896, # encoder model dim enc_depth = 2, # encoder depth dec_dim = 796, # decoder model dim dec_depth = 12, # decoder depth dec_cross_attn_layers = (3, 6, 9, 12), # decoder cross attention layers (with causal chunk cross attention) heads = 8, # attention heads dim_head = 64, # dimension per head dec_attn_dropout = 0.25, # decoder attention dropout dec_ff_dropout = 0.25, # decoder feedforward dropout use_deepnet = True # turn on post-normalization with DeepNet residual scaling and initialization, for scaling to 1000 layers ).cuda() wrapper = TrainingWrapper( retro = retro, # path to retro instance knn = 2, # knn (2 in paper was sufficient) chunk_size = 64, # chunk size (64 in paper) documents_path = '/content/text/', # path to folder of text glob = '**/*.txt', # text glob chunks_memmap_path = './train.chunks.dat', # path to chunks seqs_memmap_path = './train.seq.dat', # path to sequence data doc_ids_memmap_path = './train.doc_ids.dat', # path to document ids per chunk (used for filtering neighbors belonging to same document) max_chunks = 500_000, # maximum cap to chunks max_seqs = 100_000, # maximum seqs knn_extra_neighbors = 100, # num extra neighbors to fetch max_index_memory_usage = '100m', current_memory_available = '10G' )
Out:
processing /content/text/kxaa.txt Downloading: "https://github.com/huggingface/pytorch-transformers/archive/main.zip" to /root/.cache/torch/hub/main.zip Downloading: 100% 29.0/29.0 [00:00<00:00, 662B/s] Downloading: 100% 570/570 [00:00<00:00, 14.6kB/s] Downloading: 100% 208k/208k [00:00<00:00, 2.26MB/s] Downloading: 100% 426k/426k [00:00<00:00, 4.60MB/s] Token indices sequence length is longer than the specified maximum sequence length for this model (3449121 > 512). Running this sequence through the model will result in indexing errors Using cache found in /root/.cache/torch/hub/huggingface_pytorch-transformers_main Downloading: 100% 416M/416M [00:09<00:00, 50.3MB/s] Some weights of the model checkpoint at bert-base-cased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight'] - This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). embedded XXXXX / 53893 saved .tmp/embeddings/XXXXX.npy 2022-05-17 02:34:09,316 [INFO]: Using 2 omp threads (processes), consider increasing --nb_cores if you have more 2022-05-17 02:34:09,317 [INFO]: Launching the whole pipeline 05/17/2022, 02:34:09 2022-05-17 02:34:09,321 [INFO]: Reading total number of vectors and dimension 05/17/2022, 02:34:09 100%|██████████| 108/108 [00:00<00:00, 5336.89it/s] 2022-05-17 02:34:09,465 [INFO]: There are 53893 embeddings of dim 768 2022-05-17 02:34:09,466 [INFO]: >>> Finished "Reading total number of vectors and dimension" in 0.1405 secs 2022-05-17 02:34:09,471 [INFO]: Compute estimated construction time of the index 05/17/2022, 02:34:09 2022-05-17 02:34:09,474 [INFO]: -> Train: 16.7 minutes 2022-05-17 02:34:09,478 [INFO]: -> Add: 0.5 seconds 2022-05-17 02:34:09,480 [INFO]: Total: 16.7 minutes 2022-05-17 02:34:09,481 [INFO]: >>> Finished "Compute estimated construction time of the index" in 0.0070 secs 2022-05-17 02:34:09,484 [INFO]: Checking that your have enough memory available to create the index 05/17/2022, 02:34:09 2022-05-17 02:34:09,487 [INFO]: 541.5MB of memory will be needed to build the index (more might be used if you have more) 2022-05-17 02:34:09,488 [INFO]: >>> Finished "Checking that your have enough memory available to create the index" in 0.0025 secs 2022-05-17 02:34:09,489 [INFO]: Selecting most promising index types given data characteristics 05/17/2022, 02:34:09 2022-05-17 02:34:09,490 [INFO]: >>> Finished "Selecting most promising index types given data characteristics" in 0.0002 secs 2022-05-17 02:34:09,499 [INFO]: Creating the index 05/17/2022, 02:34:09 2022-05-17 02:34:09,500 [INFO]: -> Instanciate the index OPQ256_1024,IVF1024_HNSW32,PQ256x8 05/17/2022, 02:34:09 2022-05-17 02:34:09,509 [INFO]: >>> Finished "-> Instanciate the index OPQ256_1024,IVF1024_HNSW32,PQ256x8" in 0.0089 secs 2022-05-17 02:34:09,510 [INFO]: The index size will be approximately 18.2MB 2022-05-17 02:34:09,512 [INFO]: -> Extract training vectors 05/17/2022, 02:34:09 2022-05-17 02:34:09,513 [INFO]: Will use 53893 vectors to train the index, that will use 903.8MB of memory 99%|█████████▉| 107/108 [00:00<00:00, 521.43it/s] 2022-05-17 02:34:09,732 [INFO]: >>> Finished "-> Extract training vectors" in 0.2194 secs 2022-05-17 02:34:10,226 [INFO]: >>> Finished "Creating the index" in 0.7267 secs 2022-05-17 02:34:10,228 [INFO]: >>> Finished "Launching the whole pipeline" in 0.9070 secs --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) [<ipython-input-6-d42557af9f46>](https://localhost:8080/#) in <module>() 13 knn_extra_neighbors = 100, # num extra neighbors to fetch 14 max_index_memory_usage = '100m', ---> 15 current_memory_available = '10G' 16 ) 6 frames /usr/local/lib/python3.7/dist-packages/faiss/swigfaiss.py in index_cpu_to_gpu(provider, device, index, options) 10273 def index_cpu_to_gpu(provider, device, index, options=None): 10274 r""" converts any CPU index that can be converted to GPU""" > 10275 return _swigfaiss.index_cpu_to_gpu(provider, device, index, options) 10276 10277 def index_cpu_to_gpu_multiple(provider, devices, index, options=None): RuntimeError: Error in void faiss::gpu::GpuIndexIVFPQ::verifySettings_() const at /project/faiss/faiss/gpu/GpuIndexIVFPQ.cu:428: Error: 'ivfpqConfig_.interleavedLayout || IVFPQ::isSupportedPQCodeLength(subQuantizers_)' failed: Number of bytes per encoded vector / sub-quantizers (256) is not supported
I don't know the logic behind the solution but I am sharing what worked for me. I increased the memory of these two parameters:
max_index_memory_usage = '100m',
current_memory_available = '10G'
to:
max_index_memory_usage = '2G',
current_memory_available = '50G'.
This error occurs when trying to use TrainingWrapper. If the training data is 1 megabyte in total, no error occurs. On larger data this error appears.
Apparently the script is trying to process all the data at once, not in batches. Because of this there is a lack of system resources.
RAM: 12 gb VRAM: 12 gb
Out: