lucidrains / RETRO-pytorch

Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch
Apache License 2.0
845 stars 106 forks source link

RuntimeError: Error in void faiss::gpu::GpuIndexIVFPQ::verifySettings_() #24

Open 0x7o opened 2 years ago

0x7o commented 2 years ago

This error occurs when trying to use TrainingWrapper. If the training data is 1 megabyte in total, no error occurs. On larger data this error appears.

Apparently the script is trying to process all the data at once, not in batches. Because of this there is a lack of system resources.

RAM: 12 gb VRAM: 12 gb

import torch
from retro_pytorch import RETRO, TrainingWrapper

retro = RETRO(
    chunk_size = 64,                         # the chunk size that is indexed and retrieved (needed for proper relative positions as well as causal chunked cross attention)
    max_seq_len = 2048,                      # max sequence length
    enc_dim = 896,                           # encoder model dim
    enc_depth = 2,                           # encoder depth
    dec_dim = 796,                           # decoder model dim
    dec_depth = 12,                          # decoder depth
    dec_cross_attn_layers = (3, 6, 9, 12),   # decoder cross attention layers (with causal chunk cross attention)
    heads = 8,                               # attention heads
    dim_head = 64,                           # dimension per head
    dec_attn_dropout = 0.25,                 # decoder attention dropout
    dec_ff_dropout = 0.25,                   # decoder feedforward dropout
    use_deepnet = True                       # turn on post-normalization with DeepNet residual scaling and initialization, for scaling to 1000 layers
).cuda()

wrapper = TrainingWrapper(
    retro = retro,                                 # path to retro instance
    knn = 2,                                       # knn (2 in paper was sufficient)
    chunk_size = 64,                               # chunk size (64 in paper)
    documents_path = '/content/text/',              # path to folder of text
    glob = '**/*.txt',                             # text glob
    chunks_memmap_path = './train.chunks.dat',     # path to chunks
    seqs_memmap_path = './train.seq.dat',          # path to sequence data
    doc_ids_memmap_path = './train.doc_ids.dat',   # path to document ids per chunk (used for filtering neighbors belonging to same document)
    max_chunks = 500_000,                        # maximum cap to chunks
    max_seqs = 100_000,                            # maximum seqs
    knn_extra_neighbors = 100,                     # num extra neighbors to fetch
    max_index_memory_usage = '100m',
    current_memory_available = '10G'
)

Out:

processing /content/text/kxaa.txt
Downloading: "https://github.com/huggingface/pytorch-transformers/archive/main.zip" to /root/.cache/torch/hub/main.zip
Downloading: 100%
29.0/29.0 [00:00<00:00, 662B/s]
Downloading: 100%
570/570 [00:00<00:00, 14.6kB/s]
Downloading: 100%
208k/208k [00:00<00:00, 2.26MB/s]
Downloading: 100%
426k/426k [00:00<00:00, 4.60MB/s]
Token indices sequence length is longer than the specified maximum sequence length for this model (3449121 > 512). Running this sequence through the model will result in indexing errors
Using cache found in /root/.cache/torch/hub/huggingface_pytorch-transformers_main
Downloading: 100%
416M/416M [00:09<00:00, 50.3MB/s]
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

embedded XXXXX / 53893
saved .tmp/embeddings/XXXXX.npy
2022-05-17 02:34:09,316 [INFO]: Using 2 omp threads (processes), consider increasing --nb_cores if you have more
2022-05-17 02:34:09,317 [INFO]: Launching the whole pipeline 05/17/2022, 02:34:09
2022-05-17 02:34:09,321 [INFO]: Reading total number of vectors and dimension 05/17/2022, 02:34:09
100%|██████████| 108/108 [00:00<00:00, 5336.89it/s]
2022-05-17 02:34:09,465 [INFO]: There are 53893 embeddings of dim 768
2022-05-17 02:34:09,466 [INFO]: >>> Finished "Reading total number of vectors and dimension" in 0.1405 secs
2022-05-17 02:34:09,471 [INFO]:     Compute estimated construction time of the index 05/17/2022, 02:34:09
2022-05-17 02:34:09,474 [INFO]:         -> Train: 16.7 minutes
2022-05-17 02:34:09,478 [INFO]:         -> Add: 0.5 seconds
2022-05-17 02:34:09,480 [INFO]:         Total: 16.7 minutes
2022-05-17 02:34:09,481 [INFO]:     >>> Finished "Compute estimated construction time of the index" in 0.0070 secs
2022-05-17 02:34:09,484 [INFO]:     Checking that your have enough memory available to create the index 05/17/2022, 02:34:09
2022-05-17 02:34:09,487 [INFO]: 541.5MB of memory will be needed to build the index (more might be used if you have more)
2022-05-17 02:34:09,488 [INFO]:     >>> Finished "Checking that your have enough memory available to create the index" in 0.0025 secs
2022-05-17 02:34:09,489 [INFO]:     Selecting most promising index types given data characteristics 05/17/2022, 02:34:09
2022-05-17 02:34:09,490 [INFO]:     >>> Finished "Selecting most promising index types given data characteristics" in 0.0002 secs
2022-05-17 02:34:09,499 [INFO]:     Creating the index 05/17/2022, 02:34:09
2022-05-17 02:34:09,500 [INFO]:         -> Instanciate the index OPQ256_1024,IVF1024_HNSW32,PQ256x8 05/17/2022, 02:34:09
2022-05-17 02:34:09,509 [INFO]:         >>> Finished "-> Instanciate the index OPQ256_1024,IVF1024_HNSW32,PQ256x8" in 0.0089 secs
2022-05-17 02:34:09,510 [INFO]: The index size will be approximately 18.2MB
2022-05-17 02:34:09,512 [INFO]:         -> Extract training vectors 05/17/2022, 02:34:09
2022-05-17 02:34:09,513 [INFO]: Will use 53893 vectors to train the index, that will use 903.8MB of memory
 99%|█████████▉| 107/108 [00:00<00:00, 521.43it/s]
2022-05-17 02:34:09,732 [INFO]:         >>> Finished "-> Extract training vectors" in 0.2194 secs
2022-05-17 02:34:10,226 [INFO]:     >>> Finished "Creating the index" in 0.7267 secs
2022-05-17 02:34:10,228 [INFO]: >>> Finished "Launching the whole pipeline" in 0.9070 secs
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-6-d42557af9f46>](https://localhost:8080/#) in <module>()
     13     knn_extra_neighbors = 100,                     # num extra neighbors to fetch
     14     max_index_memory_usage = '100m',
---> 15     current_memory_available = '10G'
     16 )

6 frames
/usr/local/lib/python3.7/dist-packages/faiss/swigfaiss.py in index_cpu_to_gpu(provider, device, index, options)
  10273 def index_cpu_to_gpu(provider, device, index, options=None):
  10274     r""" converts any CPU index that can be converted to GPU"""
> 10275     return _swigfaiss.index_cpu_to_gpu(provider, device, index, options)
  10276 
  10277 def index_cpu_to_gpu_multiple(provider, devices, index, options=None):

RuntimeError: Error in void faiss::gpu::GpuIndexIVFPQ::verifySettings_() const at /project/faiss/faiss/gpu/GpuIndexIVFPQ.cu:428: Error: 'ivfpqConfig_.interleavedLayout || IVFPQ::isSupportedPQCodeLength(subQuantizers_)' failed: Number of bytes per encoded vector / sub-quantizers (256) is not supported
richjames0 commented 2 years ago

For anyone else running into this (as I have) there's a (fairly obvious) workaround to hardcode use_gpu to False in index_embeddings within retrieval.py. I'll update if and when I come up with a proper fix, but this at least allowed me to progress (after burning a lot of CPU cycles)

szh-max commented 1 year ago

How did you solve this problem? thanks @0x7o

debajyotimaz commented 9 months ago

This error occurs when trying to use TrainingWrapper. If the training data is 1 megabyte in total, no error occurs. On larger data this error appears.

Apparently the script is trying to process all the data at once, not in batches. Because of this there is a lack of system resources.

RAM: 12 gb VRAM: 12 gb

import torch
from retro_pytorch import RETRO, TrainingWrapper

retro = RETRO(
    chunk_size = 64,                         # the chunk size that is indexed and retrieved (needed for proper relative positions as well as causal chunked cross attention)
    max_seq_len = 2048,                      # max sequence length
    enc_dim = 896,                           # encoder model dim
    enc_depth = 2,                           # encoder depth
    dec_dim = 796,                           # decoder model dim
    dec_depth = 12,                          # decoder depth
    dec_cross_attn_layers = (3, 6, 9, 12),   # decoder cross attention layers (with causal chunk cross attention)
    heads = 8,                               # attention heads
    dim_head = 64,                           # dimension per head
    dec_attn_dropout = 0.25,                 # decoder attention dropout
    dec_ff_dropout = 0.25,                   # decoder feedforward dropout
    use_deepnet = True                       # turn on post-normalization with DeepNet residual scaling and initialization, for scaling to 1000 layers
).cuda()

wrapper = TrainingWrapper(
    retro = retro,                                 # path to retro instance
    knn = 2,                                       # knn (2 in paper was sufficient)
    chunk_size = 64,                               # chunk size (64 in paper)
    documents_path = '/content/text/',              # path to folder of text
    glob = '**/*.txt',                             # text glob
    chunks_memmap_path = './train.chunks.dat',     # path to chunks
    seqs_memmap_path = './train.seq.dat',          # path to sequence data
    doc_ids_memmap_path = './train.doc_ids.dat',   # path to document ids per chunk (used for filtering neighbors belonging to same document)
    max_chunks = 500_000,                        # maximum cap to chunks
    max_seqs = 100_000,                            # maximum seqs
    knn_extra_neighbors = 100,                     # num extra neighbors to fetch
    max_index_memory_usage = '100m',
    current_memory_available = '10G'
)

Out:

processing /content/text/kxaa.txt
Downloading: "https://github.com/huggingface/pytorch-transformers/archive/main.zip" to /root/.cache/torch/hub/main.zip
Downloading: 100%
29.0/29.0 [00:00<00:00, 662B/s]
Downloading: 100%
570/570 [00:00<00:00, 14.6kB/s]
Downloading: 100%
208k/208k [00:00<00:00, 2.26MB/s]
Downloading: 100%
426k/426k [00:00<00:00, 4.60MB/s]
Token indices sequence length is longer than the specified maximum sequence length for this model (3449121 > 512). Running this sequence through the model will result in indexing errors
Using cache found in /root/.cache/torch/hub/huggingface_pytorch-transformers_main
Downloading: 100%
416M/416M [00:09<00:00, 50.3MB/s]
Some weights of the model checkpoint at bert-base-cased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

embedded XXXXX / 53893
saved .tmp/embeddings/XXXXX.npy
2022-05-17 02:34:09,316 [INFO]: Using 2 omp threads (processes), consider increasing --nb_cores if you have more
2022-05-17 02:34:09,317 [INFO]: Launching the whole pipeline 05/17/2022, 02:34:09
2022-05-17 02:34:09,321 [INFO]: Reading total number of vectors and dimension 05/17/2022, 02:34:09
100%|██████████| 108/108 [00:00<00:00, 5336.89it/s]
2022-05-17 02:34:09,465 [INFO]: There are 53893 embeddings of dim 768
2022-05-17 02:34:09,466 [INFO]: >>> Finished "Reading total number of vectors and dimension" in 0.1405 secs
2022-05-17 02:34:09,471 [INFO]:   Compute estimated construction time of the index 05/17/2022, 02:34:09
2022-05-17 02:34:09,474 [INFO]:       -> Train: 16.7 minutes
2022-05-17 02:34:09,478 [INFO]:       -> Add: 0.5 seconds
2022-05-17 02:34:09,480 [INFO]:       Total: 16.7 minutes
2022-05-17 02:34:09,481 [INFO]:   >>> Finished "Compute estimated construction time of the index" in 0.0070 secs
2022-05-17 02:34:09,484 [INFO]:   Checking that your have enough memory available to create the index 05/17/2022, 02:34:09
2022-05-17 02:34:09,487 [INFO]: 541.5MB of memory will be needed to build the index (more might be used if you have more)
2022-05-17 02:34:09,488 [INFO]:   >>> Finished "Checking that your have enough memory available to create the index" in 0.0025 secs
2022-05-17 02:34:09,489 [INFO]:   Selecting most promising index types given data characteristics 05/17/2022, 02:34:09
2022-05-17 02:34:09,490 [INFO]:   >>> Finished "Selecting most promising index types given data characteristics" in 0.0002 secs
2022-05-17 02:34:09,499 [INFO]:   Creating the index 05/17/2022, 02:34:09
2022-05-17 02:34:09,500 [INFO]:       -> Instanciate the index OPQ256_1024,IVF1024_HNSW32,PQ256x8 05/17/2022, 02:34:09
2022-05-17 02:34:09,509 [INFO]:       >>> Finished "-> Instanciate the index OPQ256_1024,IVF1024_HNSW32,PQ256x8" in 0.0089 secs
2022-05-17 02:34:09,510 [INFO]: The index size will be approximately 18.2MB
2022-05-17 02:34:09,512 [INFO]:       -> Extract training vectors 05/17/2022, 02:34:09
2022-05-17 02:34:09,513 [INFO]: Will use 53893 vectors to train the index, that will use 903.8MB of memory
 99%|█████████▉| 107/108 [00:00<00:00, 521.43it/s]
2022-05-17 02:34:09,732 [INFO]:       >>> Finished "-> Extract training vectors" in 0.2194 secs
2022-05-17 02:34:10,226 [INFO]:   >>> Finished "Creating the index" in 0.7267 secs
2022-05-17 02:34:10,228 [INFO]: >>> Finished "Launching the whole pipeline" in 0.9070 secs
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-6-d42557af9f46>](https://localhost:8080/#) in <module>()
     13     knn_extra_neighbors = 100,                     # num extra neighbors to fetch
     14     max_index_memory_usage = '100m',
---> 15     current_memory_available = '10G'
     16 )

6 frames
/usr/local/lib/python3.7/dist-packages/faiss/swigfaiss.py in index_cpu_to_gpu(provider, device, index, options)
  10273 def index_cpu_to_gpu(provider, device, index, options=None):
  10274     r""" converts any CPU index that can be converted to GPU"""
> 10275     return _swigfaiss.index_cpu_to_gpu(provider, device, index, options)
  10276 
  10277 def index_cpu_to_gpu_multiple(provider, devices, index, options=None):

RuntimeError: Error in void faiss::gpu::GpuIndexIVFPQ::verifySettings_() const at /project/faiss/faiss/gpu/GpuIndexIVFPQ.cu:428: Error: 'ivfpqConfig_.interleavedLayout || IVFPQ::isSupportedPQCodeLength(subQuantizers_)' failed: Number of bytes per encoded vector / sub-quantizers (256) is not supported

I don't know the logic behind the solution but I am sharing what worked for me. I increased the memory of these two parameters: max_index_memory_usage = '100m', current_memory_available = '10G' to:
max_index_memory_usage = '2G', current_memory_available = '50G'.