Open utterances-bot opened 4 years ago
Nice blog - I only had time to skim through the high level of each method. Which method does the transformers pipeline use?
Nice blog - I only had time to skim through the high level of each method. Which method does the transformers pipeline use?
Thanks! The pipeline uses the NLI method.
This article is brilliantly written!
Thank you, perfect article. Could you please suggest most suitable way how to classify text (contains N sentences) to expected label?
Thank you, amazing work. Can I see the code behind your online demo please?
Hello, I'd like to kown on how many GPU your API for the Zero shot topic classification is running. Because, when trying to scan a 50 sentences text with 10 topics on Colab, it takes approximatively 5 minutes per text.... It looks like it's way faster on your web API though.
Thank you for your answer,
Clotilde
@clotildemiura It's slow if you're not on GPU since you have to run each text/candidate label pair through the model separately. If the web API is significantly faster, it's probably just because the results for examples you're looking at are cached. The web API is also just using CPU.
A few tips for speeding up the pipeline here.
thank you very much @joeddav
This is very interesting.
I had read two other papers on zero-shot learning sometime ago. The key ideas was:
Really great article Joe! This will especially work for english text right? What would you advise for non english languages what don't have mnli datasets or nli trained bert models?
@gevezex Yep, I actually trained a model on a multilingual NLI dataset for this exact purpose! Tweet here: https://twitter.com/joeddav/status/1298997753075232772
Hey Joe, great article!
I have a silly question about this in the few-shot learning for the embedding approaches:
Take the top K most frequent words V in the vocabulary of a word2vec model
By the top K most frequent words, do you mean the top K from the corpus you are trying to classify?
Thanks for the multilingual NLI, btw!
@agombert Glad you enjoyed it! Sorry, this was difficult to communicate. The format of word vector files typically orders the words by inverse frequency in the algorithm's train corpus. I meant the top K according to that ordering. So if you have a .vec
file with 100k words (lines), just use the first K.
Wondering about using bigrams in candidate labels = ["not sustainable","climate change","environment pollution","government state policy","finance bank] wondering what happens - will these work. I think b-grams could add more context.
Fantastic article!
Just a minor fix: the model name in the last code snippet should be facebook/bart-large-mnli
.
Fascinating Article Joe Is there any resource available on how to fine-tune such models with our own Data? Thanks
Really great article keep it up
Hi Joe, thanks for you article!! It is possible to fine-tune this models?
@joeddav thanks or the article. I find it very helpful.
do you happen to have the notebook/code available for mapping from s-bert to word2vec? I wonder how it is done and also how you generate the word2vec embedding for phrases such as "Science and Mathematics". 🤔
Hi thanks a lot for the article and notebook. Just have a quick question , what is the default model in the pipeline is it Bart MNLI?
Can you please show or direct me to a place where the fine-tuning is explained. I have about a 1000 sentences with their labels. I want to fine-tune this model on the task. During inference a subset of the labels will be used -- so zero shot learning would be the best way to go. But when you meant "pass the sentence twice, once with correct label and once with incorrect label while optimising cross-entropy", I want to see how that is done using HuggingFace.
As @Boodhayana said, I would also love to see the actual code that carries out the fine-tuning, I also have a data set that I want to fine tune the bart-mnli zero shot model on but can't find any examples of how to do so.
Could you please post the code you used to finetune bart-large-mnli on Yahoo answers ?
Regrettably, I failed to save that code. If you need to fine-tune, I recommend first distilling a classifier using this script, (https://github.com/huggingface/transformers/tree/main/examples/research_projects/zero-shot-distillation) and then fine-tuning the resulting model as you would any other classifier.
​
On Apr 28 2022, at 4:34 AM, Maroua Ghaouat @.***> wrote:
Could you please post the code you used to finetune bart-large-mnli on Yahoo answers ?
—
Reply to this email directly, view it on GitHub (https://github.com/joeddav/blog/issues/2#issuecomment-1112047791), or unsubscribe (https://github.com/notifications/unsubscribe-auth/ACHLU2NCFWAFRMMLCYXM7ODVHJSVBANCNFSM4OPHI3AQ).
You are receiving this because you were mentioned.
@joeddav np at all. I am able to successfully fine tune the model. Your blog, and your answers in HuggingFace forums helped me a lot. I have one concern, however. Since I am using the fine-tuned model in production, i would need it to be fast(as fast as normal text classification ones). I have ~30 labels in my dataset. I am accelerating the inference time by using "onnxruntime" on the huggingface model that i fine-tune.
The code for 'onnx'-ing is below
python -m transformers.onnx --model=facebook/bart-large-mnli --feature=sequence-classification
--atol=1e-04 dir/`
Even after that, the inference time for one piece of text takes almost 2seconds(it has to iterated through 30 labels).
Are there any methods to further fasten the inference?
Does distillation help? Any other methods that i can use along with this? I want to match the inference time taken by normal text classification.
@Boodhayana Distillation is exactly what you want. It will essentially train a student model, which is just a normal distilbert classifier, to mimic the predictions of the zero-shot teacher. You just need some example (unlabeled data).
@Boodhayana can you share or direct to place to understand how the fine tuning is actually done?
@joeddav for distillation what should the candidate labels be? I think it should be the candidate labels you want to use for your application, regardless of what the text you're using for distillation is about. For example, if I want to train a model to classify movie summaries into genres, I could use the AG news data to distill a zero-shot model into a smaller one, using hypotheses labels like ['thriller', 'action', 'suspense', 'horror', 'comedy'], even though the AG news data has nothing to do with that. Then I could fine tune that distilled model with actual movie summary - genre data, right?
Hey, thank you for getting back to me. I'm very excited to see that post! In the meantime I'm actually trying my hand with pytorch, and I'm wondering how to encode my labels. As suggested in the zero-shot learning blog post, I'm only using the labels entailment and contradiction, but I'm unsure what are the actual encodings used in the model. From this kaggle competition https://www.kaggle.com/competitions/contradictory-my-dear-watson I saw they're using 0, 1, or 2 (corresponding to entailment, neutral, and contradiction). Should I set up my encodings this way also? (0 for entailment and 2 for contradiction?)
Thank you,
Tada
On Sat, May 21, 2022 at 3:16 AM boodhayana @.***> wrote:
@Boodhayana https://github.com/Boodhayana Distillation is exactly what you want. It will essentially train a student model, which is just a normal distilbert classifier, to mimic the predictions of the zero-shot teacher. You just need some example (unlabeled data).
I plan to write a blog using a public dataset. So please wait a few days since i am using a private dataset that i cant share outside
— Reply to this email directly, view it on GitHub https://github.com/joeddav/blog/issues/2#issuecomment-1133491962, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGIE2U6CWZPMGZU4XS3AAZTVLA2P5ANCNFSM4OPHI3AQ . You are receiving this because you commented.Message ID: @.***>
-- Tadaishi Yatabe R.
http://tadaishi.wixsite.com/tada http://tadaishi.wix.com/tada
@tyatabe You can do it two ways
config
and give a new id2label
and label2id
dictionaries to the config with 0 for entailment and 2 for contradiction. You can do that as explained in this LINKNOTE: According to the competition details, you should not ignore the neutral class. You should consider all three outcomes.
@joeddav I tried distillation after training my zero shot with 'bart-large-mnli'.
I am using the parameters teacher_name_or_path
and hypothesis_template
along with classnames.txt
and unlabeled_data.txt
I get the following error:
INFO|trainer.py:1244] 2022-06-01 10:53:09,793 >> ***** Running training ***** [INFO|trainer.py:1245] 2022-06-01 10:53:09,793 >> Num examples = 1472 [INFO|trainer.py:1246] 2022-06-01 10:53:09,793 >> Num Epochs = 1 [INFO|trainer.py:1247] 2022-06-01 10:53:09,793 >> Instantaneous batch size per device = 32 [INFO|trainer.py:1248] 2022-06-01 10:53:09,793 >> Total train batch size (w. parallel, distributed & accumulation) = 32 [INFO|trainer.py:1249] 2022-06-01 10:53:09,793 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1250] 2022-06-01 10:53:09,793 >> Total optimization steps = 46 0%| | 0/46 [00:00<?, ?it/s]Traceback (most recent call last): File "/Users/boodhayana/ps2sem2/huggingface/distillatino/distill_classifier.py", line 338, in <module> main() File "/Users/boodhayana/ps2sem2/huggingface/distillatino/distill_classifier.py", line 328, in main trainer.train() File "/Users/boodhayana/.local/share/virtualenvs/huggingface-NdJ_jAKm/lib/python3.9/site-packages/transformers/trainer.py", line 1365, in train tr_loss_step = self.training_step(model, inputs) File "/Users/boodhayana/.local/share/virtualenvs/huggingface-NdJ_jAKm/lib/python3.9/site-packages/transformers/trainer.py", line 1940, in training_step loss = self.compute_loss(model, inputs) File "/Users/boodhayana/ps2sem2/huggingface/distillatino/distill_classifier.py", line 119, in compute_loss target_p = inputs["labels"] File "/Users/boodhayana/.local/share/virtualenvs/huggingface-NdJ_jAKm/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 239, in __getitem__ return self.data[item] KeyError: 'labels' 0%| | 0/46 [00:00<?, ?it/s]
This error comes after tokenizer tokenizes the entire thing. I thought that the model uses label
in config, so we should change labels
to label
, but since there is a custom compute_loss
function, I'm not so sure anymore. Can you please tell me what i can do now?
@Boodhayana I avoided this by following the suggestion at the bottom of this thread to downgrade to:
transformers==4.4.0
datasets==1.6.1
Hi, Below is the query i raised in hugging face. If you can answer it, i would love to hear from you. Thank you. The code is picked from https://huggingface.co/facebook/bart-large-mnli
Correct me if I am wrong please. I have picked both the versions ie. code under zero-shot classification pipeline and the code under manual pytorch versions and run against the labels ['Positive','Neutral','Negative'] for the sequence one day I will see the world. Below are the results.
Results (from zero-shot classification pipeline) {'sequence': 'one day I will see the world', 'labels': ['Positive', 'Negative', 'Neutral'], 'scores': [0.48784172534942627, 0.26007547974586487, 0.25208279490470886]}
Results (from Manual Pytorch Version; For the label 'Positive'}
tensor([0.2946], grad_fn=
If you notice from the both the results for the label positive, there is a huge variation. I ran the exact same code given in model page in order to test it. I am doing anything wrong ?. Please help me. Thank you.
Extra Information
The logit values from Method Manual Pytorch after applying softmax
tensor([[0.0874, 0.8761, 0.0365]], grad_fn=
Hi,
Could you share the py file for the streamlit demo?
Hello,
Could you provide more references on the techniques you mentioned learning a projection matrix from one embedding space to another ? Is it a separated model or the weights between these models would be shared and updated jointly in an end-to-end setting ?
Your article has been quite helpful. But what originally caught my attention (and motivated me to incorporate Zero-Shot in my own project) was your live demo. I wanted to share it with a friend. Unfortunately, the site is currently throwing an error--"OSError: [Errno 28] No space left on device".
@bwbate fixed, and moved to a space at https://huggingface.co/spaces/joeddav/zero-shot-demo
Take the top K most frequent words V in the vocabulary of a word2vec model
I am trying to implement the latent embedding approach using an SBERT model for my phrases/documents and I want to use the word2vec projections for my class names. Do you think it is better to use a pretrained word2vec model? Or train it with a custom corpus or something?
@gattaloukik123 I would almost always recommend using pretrained word embeddings (unless you have really weird data that doesn't look like normal text or something)
@joeddav your hugging face space currently fails with:
OSError: joeddav/xlm-roberta-large-xnli is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.
@joeddav thank for amazing blog & model
i have been doing lots of testing on this model ,my goal is to identify & classifi into negativity bias & emotin
model is not working well on this so is theri ny way to do or finetune for this task? as i think this is rea l usecase & using zeroshot is helpul as we looking for faster model & without llm so help me
"Discrimination": [
"Racism & Ethnic Bias",
"Sexism & Gender Bias",
"Homophobia & Xenophobia",
"Religious Intolerance",
"Neutral",
"Coding",
"Technical Logs/Reports"
],
"Harassment": [
"Hate Speech",
"Sexual Harassment",
"Coding",
"Neutral",
"Bullying",
"Technical Logs/Reports"
]
Zero-Shot Learning in Modern NLP | Joe Davison Blog
State-of-the-art NLP models for text classification without annotated data
https://joeddav.github.io/blog/2020/05/29/ZSL.html