failing to use BART models - Breaking the generation loop!

kontramind commented 9 months ago

Hi,

I'm trying to use f.eks, 'sshleifer/distilbart-cnn-6-6' and failing. Following message:

An error has occurred: Breaking the generation loop! To address this issue, consider fine-tuning the GReaT model for an longer period. This can be achieved by increasing the number of epochs. Alternatively, you might consider increasing the max_length parameter within the sample function. For example: model.sample(n_samples=10, max_length=2000) If the problem persists despite these adjustments, feel free to raise an issue on our GitHub page at: https://github.com/kathrinse/be_great/issues

Aleksandar

unnir commented 9 months ago

Hi,

Could you please provide your training hyperparameters or whole python code?

kontramind commented 9 months ago

Hi,

Could you please provide your training hyperparameters or whole python code?

Hi @unnir ,

Sure. Here is the code. We run training on California dataset. Keep in mind that we also introduce a workaround for BelenGarciaPascual' question. Belen and me are collaborating on same task. We are planning to work on a proper PR.

In the code below total number of epoch is 8*9.

```python
batch_size = 32
steps = len(data)//batch_size

epochs = [0,1,2,3,4,5,6,7]
columns = data.columns

for epoch in epochs:
    for idx, column in enumerate(columns):
        print(f'{epoch=} -> {column=}')
        great = GReaT(base,                                 # Name of the large language model used (see HuggingFace for more options)
              batch_size=batch_size,
              epochs=epoch*len(data.columns) + idx + 1,   # Number of epochs to train (only one epoch for demonstration)
              save_steps=steps,                            # Save model weights every x steps
              logging_steps=steps,                         # Log the loss and learning rate every x steps
              experiment_dir=f"aleks_{llm}_trainer",       # Name of the directory where all intermediate steps are saved
        )

        if epoch == 0 and  idx == 0:
            trainer = great.fit(data, conditional_col=column)
        else:
            trainer = great.fit(data, conditional_col=column, resume_from_checkpoint=True)
            rmtree(Path(f"aleks_{llm}_trainer")/f"checkpoint-{epoch*len(data.columns)*steps + idx*steps}")

        great.save(f"aleks_california_{llm}")

        for path in Path(f"aleks_{llm}_trainer").iterdir():
            if path.is_dir():
                print(f'{path=}')

unnir commented 9 months ago

My suggestion, again, is to train the model longer, but I will try to reproduce the error and debug it.

kathrinse / be_great

failing to use BART models - Breaking the generation loop! #42