Open JulesBelveze opened 3 years ago
Hello,
That's cool, I have not tested on NER right now, it will be interesting to check it. Yes, I will expand the steps into a real example, so you can see what need to be done (give me a day or two). Is your code visible somewhere ? If yes, I can check that my steps are correct.
Regards,
François
Hey @madlag ,
Thanks for your answer 😄 Awesome then, because I have started adapting it to my use case but not 100% sure if I'm doing it right.
Yep, I just sent you an invite to the repo (I removed a bunch of files from the original library). Let me know what you think of it, and if you have any suggestion!
Really appreciate your help, Cheers, Jules
Hi Jules !
I pushed a new branch "madlag_fix", but so I could not test it completely: I tried to run the experiment, but I lack the dataset files, and I don't know where to find them, so if you tell me how, I can check it more properly ;-)
Regards,
François
Hey @madlag ,
Awesome!! Thanks a lot for your help! I just tried it and training went well! 😄
However the size of the saved model is identical to the one without fine-pruning... I saw that you shared a notebook in #5 , I will have a look at it and let you know if I need further help!
Thanks again François, Cheers, Jules
Hi ! Yes, #5 contains useful stuff for you too. You will need to adjust the sparse parameters to adjust sparsity. When everything goes well, some heads are pruned, and the file size of the model is reduced. But for compatibility reasons with transformers, I cannot change the size of the FFns layers in the disk serialized version, because transformers would not accept to load it. So you have to use "optimize_model" after loading it to cut the empty parts like in the example notebook . Don't hesitate to ping me, the library still lacks better documentation, finding the right parameters is not completely trivial right now, and it will actually help me too to know what parts are the most useful to be documented!
Regards,
François
Hey Francois,
Awesome, I'm now playing around with the parameters to check how it affects performance and how much it shrinks the model! Next step: investigate how to add distillation to it :)
Cheers, Jules
Hey all,
Thanks for the great repo, that's exactly what I was looking for 😄
I have been through the examples and documentation you guys provided but I am attempting to use the library for token classification (specifically for NER). I have my own
datasets.Dataset
, a custom BERT model and I am not using a HFTrainer
.I have tried to follow the steps provided here but they are quite confusing to me... @madlag Could you by any chance give me further hints/notebooks on how I could use the library to reach my end goal?
Thanks a lot for you help, Cheers, Jules