Add jina-clip notebook - Githubissues

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-07-02T06:12:11Z ----------------------------------------------------------------

are you sure that is such old transformers version should be enough?

sbalandi commented on 2024-07-02T19:23:37Z ----------------------------------------------------------------

updated

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-07-02T06:12:12Z ----------------------------------------------------------------

can you use the same visualization utilities like bellow for results demonstration?

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-07-02T06:12:12Z ----------------------------------------------------------------

text-to-image usually means generation of images by text description, probably in this description you mean text-to-image-retrieval. From other part of description I read several times, but can not understand how it differ from previous example?

Probably you need to make accent on passing text and image simultaneously and preprocessing them outside model

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-07-02T06:12:13Z ----------------------------------------------------------------

Line #19.    visual_inputs = processor(images=IMAGE_INPUTS)

if you add return_tensors="pt" you get torch tensors directly from processor and the next line will not be needed (this is preferable way to do that)

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-07-02T06:12:14Z ----------------------------------------------------------------

Line #5.    visualize_result(IMAGE_INPUTS[0], TEXT_INPUTS, int8_res_ov['logits_per_image'][0])

negative values when we speak about probability score may looks confusing, maybe it make sense to apply softmax or rename probability to similarity

eaidova commented 3 months ago

@sbalandi can we focus on separated representation of text encoder and image encoder (convert it as 2 models) in this notebook instead of one common model?

The main advantage of jina-clip that it's suitable not only for text to image matching, but also support comparing text embeddings in comparision with clip models itself, so having them as separated models may be more universal, so I think we should emphasis this

sbalandi commented 3 months ago

updated

View entire conversation on ReviewNB

sbalandi commented 3 months ago

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-07-02T06:12:12Z ----------------------------------------------------------------

text-to-image usually means generation of images by text description, probably in this description you mean text-to-image-retrieval. From other part of description I read several times, but can not understand how it differ from previous example?

Probably you need to make accent on passing text and image simultaneously and preprocessing them outside model

removed here, leaved it only in description, where it was from original source

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-07-03T06:53:45Z ----------------------------------------------------------------

Line #25.    img_coco = Image.open("./data/coco.jpg")

I do not see this image used in other notebook parts, is it required?

_sbalandi commented on 2024-07-03T13:07:41Z_ ----------------------------------------------------------------

It is used in gradio example

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-07-03T06:53:46Z ----------------------------------------------------------------

Line #1.    from pathlib import Path

for comparing sizes you need to make sure that int8 model exists (probably also need to move variables with paths outside cells with %%skipl magic, otherwise if skipping condition will be True, this variable will not exists) or mark such cells with %%skip too)

_sbalandi commented on 2024-07-03T13:09:53Z_ ----------------------------------------------------------------

added %%skip not $to_quantize.value and int8_text/vision_model_path are moved to separate cells

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-07-03T06:53:47Z ----------------------------------------------------------------

Line #3.    print(f"Performance speed up: {fp16_latency / int8_latency:.3f}")

please add text encoder/image encoder in printed message. It is difficult to understand for which model performance is reported

_sbalandi commented on 2024-07-03T13:10:01Z_ ----------------------------------------------------------------

added

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-07-03T06:53:47Z ----------------------------------------------------------------

Line #7.        emb1_res = compiled_text_model(text_inputs["input_ids"])

which model used in gradio? fp16 or int8? is there opportunity to select it?

_sbalandi commented on 2024-07-03T13:10:31Z_ ----------------------------------------------------------------

fp16 by default, added checkbox to manage it

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

aleksandr-mokrov commented on 2024-07-03T09:11:35Z ----------------------------------------------------------------

Line #4.        ov_text_model = ov.convert_model(model.text_model, example_input=text_inputs["input_ids"])

I've got an error here:

TracingCheckError                         Traceback (most recent call last)
File ~/test_notebooks/jina-clip/openvino_notebooks/notebooks/jina-clip/venv/lib/python3.10/site-packages/openvino/frontend/pytorch/ts_decoder.py:41, in TorchScriptPythonDecoder.__init__(self, pt_module, graph_element, example_input, alias_db, shared_memory, skip_freeze, constant_cache, module_extensions)
     40 try:
---> 41     pt_module = self._get_scripted_model(
     42         pt_module, example_input, skip_freeze)
     43 except Exception as e:

File ~/test_notebooks/jina-clip/openvino_notebooks/notebooks/jina-clip/venv/lib/python3.10/site-packages/openvino/frontend/pytorch/ts_decoder.py:133, in TorchScriptPythonDecoder._get_scripted_model(self, pt_module, example_inputs, skip_freeze)
    132 try:
--> 133     scripted = torch.jit.trace(
    134         pt_module, **input_parameters, strict=False)
    135 finally:

File ~/test_notebooks/jina-clip/openvino_notebooks/notebooks/jina-clip/venv/lib/python3.10/site-packages/torch/jit/_trace.py:820, in trace(func, example_inputs, optimize, check_trace, check_inputs, check_tolerance, strict, _force_outplace, _module_class, _compilation_unit, example_kwarg_inputs, _store_inputs)
    819             raise RuntimeError("example_kwarg_inputs should be a dict")
--> 820     return trace_module(
    821         func,
    822         {"forward": example_inputs},
    823         None,
    824         check_trace,
    825         wrap_check_inputs(check_inputs),
    826         check_tolerance,
    827         strict,
    828         _force_outplace,
    829         _module_class,
    830         example_inputs_is_kwarg=isinstance(example_kwarg_inputs, dict),
    831         _store_inputs=_store_inputs,
    832     )
    833 if (
    834     hasattr(func, "__self__")
    835     and isinstance(func.__self__, torch.nn.Module)
    836     and func.__name__ == "forward"
    837 ):

File ~/test_notebooks/jina-clip/openvino_notebooks/notebooks/jina-clip/venv/lib/python3.10/site-packages/torch/jit/_trace.py:1116, in trace_module(mod, inputs, optimize, check_trace, check_inputs, check_tolerance, strict, _force_outplace, _module_class, _compilation_unit, example_inputs_is_kwarg, _store_inputs)
   1115             else:
-> 1116                 _check_trace(
   1117                     [inputs],
   1118                     func,
   1119                     check_trace_method,
   1120                     check_tolerance,
   1121                     strict,
   1122                     _force_outplace,
   1123                     True,
   1124                     _module_class,
   1125                     example_inputs_is_kwarg=example_inputs_is_kwarg,
   1126                 )
   1127 finally:

File ~/test_notebooks/jina-clip/openvino_notebooks/notebooks/jina-clip/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    114 with ctx_factory():
--> 115     return func(*args, **kwargs)

File ~/test_notebooks/jina-clip/openvino_notebooks/notebooks/jina-clip/venv/lib/python3.10/site-packages/torch/jit/_trace.py:591, in _check_trace(check_inputs, func, traced_func, check_tolerance, strict, force_outplace, is_trace_module, _module_class, example_inputs_is_kwarg)
    590 if any(info is not None for info in diag_info):
--> 591     raise TracingCheckError(*diag_info)

TracingCheckError: Tracing failed sanity checks!
ERROR: Graphs differed across invocations!
    Graph diff:
          graph(%self.1 : __torch__.transformers_modules.jinaai.jina-clip-implementation.952897b38094b9f6a47b3d9a1d8239523e374098.hf_model.HFTextEncoder,
                %x.1 : Tensor):

And a long long diff below

_sbalandi commented on 2024-07-03T13:09:02Z_ ----------------------------------------------------------------

fixed by calling directly text_model/vision_model , please, check again

sbalandi commented 3 months ago

It is used in gradio example

View entire conversation on ReviewNB

sbalandi commented 3 months ago

fixed by calling directly text_model/vision_model , please, check again

View entire conversation on ReviewNB

sbalandi commented 3 months ago

added %%skip not $to_quantize.value and int8_text/vision_model_path are moved to separate cells

View entire conversation on ReviewNB

sbalandi commented 3 months ago

added

View entire conversation on ReviewNB

sbalandi commented 3 months ago

fp16 by default, added checkbox to manage it

View entire conversation on ReviewNB

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-07-04T11:13:37Z ----------------------------------------------------------------

>>how the life demo for zero-shot image classification task.

But actually demo not only for zero-shot image classification now, could you please update description?

sbalandi commented on 2024-07-04T11:21:34Z ----------------------------------------------------------------

clarifications removed

review-notebook-app[bot] commented 3 months ago

View / edit / reply to this conversation on ReviewNB

eaidova commented on 2024-07-04T11:13:38Z ----------------------------------------------------------------

looks like text formatting issue, text should be moved on next line after back on top

sbalandi commented on 2024-07-04T11:21:17Z ----------------------------------------------------------------

moved

sbalandi commented 3 months ago

moved

View entire conversation on ReviewNB

sbalandi commented 3 months ago

clarifications removed

View entire conversation on ReviewNB

openvinotoolkit / openvino_notebooks

Add jina-clip notebook #2157