It will be great to have some tests to make sure this benchmark is runnable.

mrbeann commented 2 years ago

After pulling the recent updates, the code is unable to run. Some errors like, recent code change the sys.path.append(os.getcwd()) after all imports, makes the import raise error.

I think it needs tests to make sure all the changes do not make this repo worse.

pliang279 commented 2 years ago

apologies - we did some refactoring of the code and are in the middle of fixing these issues and I will ping you as soon as we fix these issues and add tests. They should be fully fixed by the weekend. sorry again!

mrbeann commented 2 years ago

Thank you for your quick reply and the awesome work! Looking forward to seeing a better MultiBench.

BTW, working on a dev branch, and merging it to the master once complete may be a good idea.

mrbeann commented 2 years ago

Hi @pliang279 , I'm not sure if the refactoring is done, but it seems the repo hasn't been updated for a while. After pulling it, I found it seems only the import error is fixed. Actually, there are other errors like inconsistent parameters are not fixed.

I'm not sure if you need some bug reports, if yes, I'm willing to help. (Actually, just run the examples can find most of them)

arav-agarwal2 commented 2 years ago

Hello @mrbeann; I'm the one making the current updates to the repo. I believe I've ironed out the kinks ( and am testing them as we speak to make sure that's the case. ) Apologies on the delay - this is all in part to make the tests work globally on more systems, which is taking more time than I previously projected.

mrbeann commented 2 years ago

Hi, @arav-agarwal2 thank you for your timely contributions, it's indeed very time-consuming to test all the functions. I have tested some of the code, it's much better now. Besides, I think I can help in pointing some errors I encountered.

In example/affect some codes still raise errors, e.g., affect_late_fusion.py

max_batch_size = input.size(0) if self.batch_first else input.size(1)
AttributeError: 'list' object has no attribute 'size'

affect_tf.py

single_test(model, test_dataloaders_all[list(test_dataloaders_all.keys())[
AttributeError: 'DataLoader' object has no attribute 'keys'

I guess the affect part is not tested yet, other examples also raise errors.

Some files and folders can be excluded, e.g., pyc.
Besides, it seems all the datapath are hard encoded in the code, I think it will be great to use some relative path, or just add a configuration file to make it easier to modify.

BTW, sorry I am still not familiar enough with this project to send a PR. Hope I can help soon.

Vanvan2017 commented 2 years ago

Thanks for pointing out the refactoring errors, this may caused by some missing options which are not up-to-date with the recent changes during the refactoring process~ I have fixed that and everything can run after a quick test with train and test! Indeed the code still have some reused parts in training structures and some hard-coded paths, we will gradually fix them~ please stay tuned for that~ And I think you can do that in you own repo, we can merge them on a local machine if you are not familiar with PR~

mrbeann commented 2 years ago

Thank you for your reply. It seems MULT seems not working, you fix this? Thanks! @Vanvan2017

Vanvan2017 commented 2 years ago

Thank you for your reply. It seems MULT seems not working, you fix this? Thanks! @Vanvan2017

Yeah~ but mult will be slow to run, do you got some error message? Maybe I can help with that~

mrbeann commented 2 years ago

Thank you for your reply. It seems MULT seems not working, you fix this? Thanks! @Vanvan2017

Yeah~ but mult will be slow to run, do you got some error message? Maybe I can help with that~

Yeah the error message is,

File "MultiBench/fusions/mult/modules/position_embedding.py", line 84, in forward
    return self.weights[device].index_select(0, positions.view(-1)).view(bsz, seq_len, -1).detach()
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Vanvan2017 commented 2 years ago

Ok, thanks! It seems that the problem caused by dimension, by default in this example, the mult uses mosi_data.pkl, I tested with that, it should be fine ~

mrbeann commented 2 years ago

Yeah, I use mosi_data.pkl too. It's quite strange. Thank anyway!

Vanvan2017 commented 2 years ago

Yeah, it seems strange, this problem is generally triggered by the discontinuous memory address, this may caused by the system or multiple GPUs training I think.

mrbeann commented 2 years ago

Yeah, it seems strange, this problem is generally triggered by the discontinuous memory address, this may caused by the system or multiple GPUs training I think.

Yeah, this may be the error. BTW, I found the get_data can't work for sarcasm and humor datasets. Can you take a look? Thanks!

if sample[-1].shape[1] > 1:
IndexError: tuple index out of range

Vanvan2017 commented 2 years ago

Hey~, thanks for the error message, but I didn't got errors with sarcasm and humor with the code below:

get_dataloader('/home/pliang/multibench/affect/sarcasm.pkl', robust_test=False, max_pad=True, task='classification', data_type='sarcasm', max_seq_len=20)

torch.Size([32, 20, 371])
torch.Size([32, 20, 81])
torch.Size([32, 20, 300])
torch.Size([32, 1])
tensor([[1],
        [1],
        [0],
        [0],
        [0],
        [0],
        [1],
        [1],
        [1],
        [0],
        [0],
        [1],
        [0],
        [0],
        [1],
        [0],
        [0],
        [1],
        [0],
        [0],
        [1],
        [0],
        [0],
        [0],
        [0],
        [0],
        [0],
        [0],
        [1],
        [1],
        [0],
        [1]])

mrbeann commented 2 years ago

Yeah, this seems not the default code used in affect_late_fusion， and when I change it to this it still raises error.

AttributeError: 'list' object has no attribute 'size'

Maybe you can have a try? Thanks!

Vanvan2017 commented 2 years ago

Yeah, by default it is for regression~ I tried that but seems ok

Epoch 0 train loss: tensor(1.2164, device='cuda:0', grad_fn=<DivBackward0>)
Epoch 0 valid loss: 1.1879308223724365
Saving Best
Training Time: 7.292659044265747
Training Peak Mem: 2643.79296875
Training Params: 2570311
Testing:
acc: 0.6951219512195121, 0.6982507288629738
Inference Time: 0.632659912109375
Inference Params: 2570311

Vanvan2017 commented 2 years ago

Oh, maybe you can also refer to other examples with max_pad, I tried to give every usage of the code base varies in examples

mrbeann commented 2 years ago

That's quite strange. Can you paste the full script you used?

Vanvan2017 commented 2 years ago

import sys
import os
sys.path.insert(1,os.getcwd())
sys.path.append(os.path.dirname(os.path.dirname(os.getcwd())))
from training_structures.Supervised_Learning import train, test
from unimodals.common_models import GRU, MLP
from datasets.affect.get_data import get_dataloader
from fusions.common_fusions import Concat
import torch

# mosi_data.pkl, mosei_senti_data.pkl
# mosi_raw.pkl, mosei_raw.pkl, sarcasm.pkl, humor.pkl
# raw_path: mosi.hdf5, mosei.hdf5, sarcasm_raw_text.pkl, humor_raw_text.pkl
traindata, validdata, test_robust = get_dataloader(
    '/home/pliang/multibench/affect/pack/mosi/mosi_raw.pkl', robust_test=False, data_type='mosi')

# traindata, validdata, test_robust = \
#     get_dataloader('/home/pliang/multibench/affect/sarcasm.pkl', robust_test=False)

# mosi/mosei
encoders = [GRU(35, 70, dropout=True, has_padding=True, batch_first=True).cuda(),
            GRU(74, 200, dropout=True, has_padding=True, batch_first=True).cuda(),
            GRU(300, 600, dropout=True, has_padding=True, batch_first=True).cuda()]
head = MLP(870, 870, 1).cuda()

# encoders=[GRU(20,40,dropout=True,has_padding=True).cuda(), \
#     GRU(5, 20,dropout=True,has_padding=True).cuda(),\
#     GRU(300, 600,dropout=True,has_padding=True).cuda()]
# head=MLP(660,512,1, dropoutp=True).cuda()

# humor/sarcasm
# encoders=[GRU(371,512,dropout=True,has_padding=False, batch_first=True).cuda(), \
#     GRU(81,256,dropout=True,has_padding=False, batch_first=True).cuda(),\
#     GRU(300,600,dropout=True,has_padding=False, batch_first=True).cuda()]
# head=MLP(1368,512,1).cuda()

fusion = Concat().cuda()

train(encoders, fusion, head, traindata, validdata, 1, task="regression", optimtype=torch.optim.AdamW,
      early_stop=False, is_packed=True, lr=1e-3, save='mosi_lf_best.pt', weight_decay=0.01, objective=torch.nn.L1Loss())

print("Testing:")
model = torch.load('mosi_lf_best.pt').cuda()

test(model=model, test_dataloaders_all=test_robust, dataset='mosi', is_packed=True,
     criterion=torch.nn.L1Loss(), task='posneg-classification', no_robust=True)

mrbeann commented 2 years ago

Sorry, it seems I do not make it clear. I mean use sarcasm get errors. This script is still using mosi.

Vanvan2017 commented 2 years ago

import sys
import os
sys.path.insert(1,os.getcwd())
sys.path.append(os.path.dirname(os.path.dirname(os.getcwd())))
from training_structures.Supervised_Learning import train, test
from unimodals.common_models import GRU, MLP
from datasets.affect.get_data import get_dataloader
from fusions.common_fusions import Concat
import torch

# mosi_data.pkl, mosei_senti_data.pkl
# mosi_raw.pkl, mosei_raw.pkl, sarcasm.pkl, humor.pkl
# raw_path: mosi.hdf5, mosei.hdf5, sarcasm_raw_text.pkl, humor_raw_text.pkl
# traindata, validdata, test_robust = get_dataloader(
#     '/home/pliang/multibench/affect/pack/mosi/mosi_raw.pkl', robust_test=False, data_type='mosi')

traindata, validdata, test_robust = get_dataloader('/home/pliang/multibench/affect/sarcasm.pkl', robust_test=False, task='classification', data_type='sarcasm')

# mosi/mosei
# encoders = [GRU(35, 70, dropout=True, has_padding=True, batch_first=True).cuda(),
#             GRU(74, 200, dropout=True, has_padding=True, batch_first=True).cuda(),
#             GRU(300, 600, dropout=True, has_padding=True, batch_first=True).cuda()]
# head = MLP(870, 870, 1).cuda()

# encoders=[GRU(20,40,dropout=True,has_padding=True).cuda(), \
#     GRU(5, 20,dropout=True,has_padding=True).cuda(),\
#     GRU(300, 600,dropout=True,has_padding=True).cuda()]
# head=MLP(660,512,1, dropoutp=True).cuda()

# humor/sarcasm
encoders=[GRU(371,512,dropout=True,has_padding=True, batch_first=True).cuda(), \
    GRU(81,256,dropout=True,has_padding=True, batch_first=True).cuda(),\
    GRU(300,600,dropout=True,has_padding=True, batch_first=True).cuda()]
head=MLP(1368,512,2).cuda()

fusion = Concat().cuda()

train(encoders, fusion, head, traindata, validdata, 2, task="classification", optimtype=torch.optim.AdamW,
      early_stop=False, is_packed=True, lr=1e-3, save='mosi_lf_best.pt', weight_decay=0.0001, objective=torch.nn.CrossEntropyLoss())

print("Testing:")
model = torch.load('mosi_lf_best.pt').cuda()

test(model=model, test_dataloaders_all=test_robust, dataset='mosi', is_packed=True,
     criterion=torch.nn.CrossEntropyLoss(), task='classification', no_robust=True)

mrbeann commented 2 years ago

I can't run this. I guess this is because of the dataset we use? I use the latest dataset from the website (modified on 1st Nov). Is it correct? https://drive.google.com/drive/folders/1JFcX-NF97zu9ZOZGALGU9kp8dwkP7aJ7

Vanvan2017 commented 2 years ago

Yeah! I use the same data with you~ Though original authors supply a new version of data for sarcasm and humor, it should work fine with the current data with the link above. And I will also update new version from the authors and bert versions later.

mrbeann commented 2 years ago

Hi, @Vanvan2017 sorry to bother you again, I get into trouble while using the MIM fusion for the affect dataset. This seems also not included in the example part, any suggestions or examples about this? Thanks!

Vanvan2017 commented 2 years ago

Hi, @mrbeann I think there may be some inconsistencies in the that, you can use

return torch.einsum('bm, bmp -> bp', modalities[2], self.a(modalities[0:2])) + self.b(modalities[0:2])

Vanvan2017 commented 2 years ago

@mrbeann then,

traindata, validdata, test_robust = get_dataloader(
    '/home/pliang/multibench/affect/pack/mosi/mosi_data.pkl', batch_size=128, max_pad=True, robust_test=False, data_type='mosi')
encoders = [GRU(20, 10, dropout=True, has_padding=False, batch_first=True, last_only=True).cuda(),
            GRU(5, 5, dropout=True, has_padding=False, batch_first=True, last_only=True).cuda(),
            GRU(300, 100, dropout=True, has_padding=False, batch_first=True, last_only=True).cuda()]
head = MLP(20, 20, 1).cuda()
fusion = MultiplicativeInteractions3Modal([10, 5, 100], 20, task='affect').cuda()

I add a quick using arg, task.

mrbeann commented 2 years ago

Yeah, this seems good. Thanks! @Vanvan2017

pliang279 / MultiBench

It will be great to have some tests to make sure this benchmark is runnable. #15