[Bug] Inconsistency between scoring in the single-label case for zero-shot classification of text vs. images

josephrocca commented 1 year ago

Describe the bug With the zero-shot text classification, if you pass a single label, then it seems that you get back a similarity score instead of a probability-like score that's relative to the other labels. Whereas with the zero-shot image classification pipeline you always get back a score of 1 if you give a single label.

I tried changing multi_label to false, based on a Github issue, but that didn't change anything.

I'm guessing the way that this is handled depends on the way that the Python library handles it, and I an into an error while testing that:

https://github.com/huggingface/transformers/issues/24008

How to reproduce

import { pipeline } from 'https://cdn.jsdelivr.net/npm/@xenova/transformers@2.1.1';

let textClassifier = await pipeline('zero-shot-classification');
console.log("multi-label text:", await textClassifier("houston, we have a problem with the thruster", ["astronaut", "forest cabin", "rabbit and lion"], { multi_label: true }));
console.log("single-label text:", await textClassifier("houston, we have a problem with the thruster", ["astronaut"], { multi_label: false }));

let imageClassifier = await pipeline('zero-shot-image-classification');
console.log("multi-label image", await imageClassifier("https://i.imgur.com/fYhUGoY.jpg", ["astronaut", "forest cabin", "rabbit and lion"], { multi_label: true }));
console.log("single-label image", await imageClassifier("https://i.imgur.com/fYhUGoY.jpg", ["astronaut"], { multi_label: false }));

Expected behavior There should be some sort of single-label mode that returns a similarity score rather than a value that's relative to the other labels. This appears to be the case already with the zero-shot text classification pipeline, but that may be a bug.

The idea of having a multi_label option seems to be a good idea - so you have to explicitly opt in to getting the similarity score instead of the relative score.

Logs/screenshots

Environment

Transformers.js version: 2.1.1
Browser (if applicable): Chrome

xenova commented 1 year ago

Right, so, the reason behind this is because the original transformers code doesn't include a multi_label attribute for zero-shot image classification. See here.

However, I might be missing something, so, if you can provide the corresponding python code for it, I can make sure the JS version matches the output.

josephrocca commented 1 year ago

Python code is here:

https://github.com/huggingface/transformers/issues/24008

But as mentioned in that issue, the single-label call to the zero-shot image classification pipeline results in an error, so maybe we'll have to wait and see what the Python maintainers say RE whether that's kind of the expected behavior (i.e. if it's 'invalid'). But it'd be strange if that were invalid given that it's valid for the text pipeline.

But also, the multi_label option doesn't seem to affect the zero-shot text classification in the single-label case above. I'm guessing that if multi_label is set to true and you only give a single label, then it should have a score of 1? Currently that is not the case - it returns the same value regardless of the multi_label value (in the case where a single label is given, to be clear). Or maybe I misunderstand.

xenova commented 1 year ago

Just wanted to provide closure for this issue:

so maybe we'll have to wait and see what the Python maintainers say RE whether that's kind of the expected behavior (i.e. if it's 'invalid').

As stated here, it now just returns 1 in such cases (which is indeed the case here).

From your test-cases, I've compared the output the the python version, and the outputs are exactly the same for 1 and 2, with only minor differences for 3 and 4 due to different image processing in JS:

✅ zero-shot-classification + multi-label text + multi_label: true
✅ zero-shot-classification + single-label text + multi_label: false
✅ zero-shot-image-classification + multi-label text + multi_label: true
✅ zero-shot-image-classification + single-label text + multi_label: false

xenova / transformers.js

[Bug] Inconsistency between scoring in the single-label case for zero-shot classification of text vs. images #135