Closed yaolu-zjut closed 2 months ago
Hello @yaolu-zjut
This OOM error most likely occurred as args.lora
is set to None
by default (https://github.com/logix-project/logix/blob/main/examples/cifar/compute_influences.py#L14). If you don't turn this on, then you are essentially computing per-sample gradients for all modules in your model (which amounts to 2.2M based on your screen shot). You can avoid OOM by further reducing the batch size, but it will still incur a significant amount of storage costs. Overall, I recommend setting lora
to random
. Let me know if you still encounter any other issues.
Sang
Hello @yaolu-zjut
This OOM error most likely occurred as
args.lora
is set toNone
by default (https://github.com/logix-project/logix/blob/main/examples/cifar/compute_influences.py#L14). If you don't turn this on, then you are essentially computing per-sample gradients for all modules in your model (which amounts to 2.2M based on your screen shot). You can avoid OOM by further reducing the batch size, but it will still incur a significant amount of storage costs. Overall, I recommend settinglora
torandom
. Let me know if you still encounter any other issues.Sang
Hi Sang, it really works if I set the lora to random. Thank you vey much!
Hi Sang, I meet another problem when I reproduce the example of bert with "python examples/bert/extract_log.py". It shows that: Is the config.yaml missing? I can not find it in the logix folder
Hello @yaolu-zjut
Sorry for missing the config.yaml
. Here is the config I used for the BERT example. Also, I use this config for most of my experiments.
root_dir: ./logix
logging:
flush_threshold: 1000000000
cpu_offload: false
lora:
init: random (or pca)
rank: 64
Let me know if you have any other questions!
Thanks! It really helps me.
Hi Sang, I meet another strange thing. When I modify the logix on my own dataset. I find when I run examples/bert/extract_log.py, it will generate a large file and save it in 'infl' (run = logix.init('infl', config=args.config_path), infl is the project name) So if i want to implement logix on large dataset, what should i do? Continuing growth
Hello @yaolu-zjut,
The main philosophy behind LogIX and LoGra is that we want to convert an influence function problem into a vector similarity search problem. Therefore, we propose to save all training gradients to disk and simply read them at test time for influence score computations. That being said, as you increase the dataset size, the storage required for saving training gradients also increases linearly. The best solution I suggest here is simply upgrading your storage.
If you don't want to save your gradients to disk then you need to recompute training gradients every time you have a new query. You can also do this with LogIX (my collaborator did it at some point), but the code will get a bit longer and uglier.
I meet a "torch.cuda.OutOfMemoryError" problem when run the code "CUDA_VISIBLE_DEVICES=3 python examples/cifar/compute_influences.py". And I get the following problem: so how can i fix it? I just set the batchsize to 128 instead of 512, but it does not work.