Closed alielfilali01 closed 6 months ago
Hello, thank you for your interest in our work.
It is possible to push our adapter to the hub, but it requires our PEFT library to load it.
Regarding the pretraining, 250M refers to the number of parameters in the model. We follow the ReLoRA preprocessing method for the C4 dataset (which can be found in its repository) and directly load the tokenized dataset in our training script.
Thanks for the quick response
I'm just curious what do you mean exactly with this requires our PEFT library to load it
? like is there an argument i can pass to the training args like --hf_token "xxx" --push_to_hub true --push_private true
?
Also now i'm thinking about merging the adapters back to the base, at first i thought it shouldn't be a problem, but now i think about the merge_and_unload() function expect an A and B matrices, so will it have problemes with MoRA ? I still think that it could be feasible but would love to hear a confirmation from assuming you already done it. Also if the current script support --merge_adapter
argument.
About the second point, I believe it is okay if i pass an untokinezed C4-like dataset, the script should tokenize it right ? at least that's how i think it is done by the original peft ...
We supports merging (we have modified merge_and_unload
to merge correctly) and loading MoRA via our PEFT.
But current training script provided does not support pushing to the hub or merging by flags. (sorry for it) You may add corresponding code or using other training scripts to achieve this. To load the pushed MoRA from the hub, it requires to use our PEFT library installed via pip install -ed ./peft-mora
.
For the second point, we only support loading tokenized C4 datasets because tokenizing them can take a lot of time (I don't think it can be done by original PEFT . ). To tokenize, we use this script from ReLoRA.
Ok I see, Thank you,you have been really helpful to get a bigger picture about the current state of the code. I will close this issue now and maybe re-open it later if i got any further questions ... Thanks again 🤗
Hi All, great paper btw that came in an extremely great time for me personally. I just want to inquire about the possibility to push the adapter to the hub, is it supported in your code base ? Also for the pretraining code, i would like to know what does the
250m
means in the--pretrain 250m
? and since i see no column mapping, i believe the code expect data withtext
column only ... If you can please answer/confirm these points i would greatly appreciate it and again, great work, congratulations