Closed mscherrmann closed 1 year ago
Can you please provide the trace for the original run with our fine tunint script as well?
The problem of your code on a windows machine seems to be that the triton package is not available for on windows. However, do you think it is possible to finetune using torch and transformers?
It should be possible. Triton is used for faster flash-attn, but it can be replaced with torch ops. I recommend trying it out!
Unfortunately, we would not be able to help you debug your script in this case as it's outside the scope of things we support
Ok, I see.
Hi,
I pretrained a mosaic-bert version on a linux server. In a next step I like to finetune the model locally on a windows 11 machine. I tried to run your finetuning script but unfortunately it does nto seem to work on a windows machine.
Another try was to convert the model to the Huggingface format. I tried this approach using the model "mosaicml/mosaic-bert-base" on Huggingface first, after allowing for "BertForSequenceClassification" in the config file. This works, but it seems that the training does not converge, in contrast to the original Huggingface model "bert-base-uncased". Do you have any explanations or recommendations for that?
This is the code: