Hello, I just learn about SetFit and now I want to use it for my ABSA usecase. I have 50.000 row of datasets which the maximum token per row is 511. When I use ABSATrainer for this dataset, I encounter this error :
File "/home/azhar/miniforge3/envs/preskripsi/lib/python3.10/site-packages/setfit/trainer.py", line 502, in get_dataloader
data_sampler = ContrastiveDataset(
File "/home/azhar/miniforge3/envs/preskripsi/lib/python3.10/site-packages/setfit/sampler.py", line 68, in __init__
self.generate_pairs()
File "/home/azhar/miniforge3/envs/preskripsi/lib/python3.10/site-packages/setfit/sampler.py", line 90, in generate_pairs
for (_text, _label), (text, label) in shuffle_combinations(self.sentence_labels):
File "/home/azhar/miniforge3/envs/preskripsi/lib/python3.10/site-packages/setfit/sampler.py", line 29, in shuffle_combinations
idxs = np.stack(np.triu_indices(n, k), axis=-1)
File "/home/azhar/miniforge3/envs/preskripsi/lib/python3.10/site-packages/numpy/lib/twodim_base.py", line 1113, in triu_indices
tri_ = ~tri(n, m, k=k - 1, dtype=bool)
File "/home/azhar/miniforge3/envs/preskripsi/lib/python3.10/site-packages/numpy/lib/twodim_base.py", line 414, in tri
m = greater_equal.outer(arange(N, dtype=_min_int(0, N)),
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 46.6 GiB for an array with shape (223709, 223709) and data type bool
How to solve this error? Is it because my row is too much? I saw other example in the github issue and it uses 200 rows. I tried 200 rows too but get the exact same error.
I didn't really understand how SetFit works, hence I don't know what to do to change things so I can solve the error. So can you also explain it a bit on how does it works? Like I saw Contrastive in the training and the ~tri seems like a triangular matrix for masking no? Why masking requires huge dimensional matrix?
Hello, I just learn about SetFit and now I want to use it for my ABSA usecase. I have 50.000 row of datasets which the maximum token per row is 511. When I use ABSATrainer for this dataset, I encounter this error :