njallskarp / finetune-qa-powerset

Finetuning BERT models on a powerset of different linguistic domains
https://lvl.ru.is/
5 stars 0 forks source link

Training logic on powerset #46

Open njallskarp opened 1 year ago

njallskarp commented 1 year ago

Training logic for powerset

We have already added a command line argument (main.py) where users can specify the domains or datasets. What we need to do is we need to have each domain load a different Dataset class. Then, during each iteration of for set of sources in the powerset, we need to use torch Dataset's concat method to concatinate the multiple domains to create a single Dataset. If we have N domains then we will end up creating 2^N - 1 dataset classes, one per iteration.

Where this could happen

It seems to me that this might happen inside the run training fuction. That is, around the for ... in range(epochs) there will be something like for domain_subset in powerset:

This means that we will need to pass the Dataset classes into this function, not the dataloaders.

We can schedule a meeting to discuss this in detail.