Closed sbalandi closed 3 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
View / edit / reply to this conversation on ReviewNB
eaidova commented on 2024-07-02T06:12:11Z ----------------------------------------------------------------
are you sure that is such old transformers version should be enough?
sbalandi commented on 2024-07-02T19:23:37Z ----------------------------------------------------------------
updated
View / edit / reply to this conversation on ReviewNB
eaidova commented on 2024-07-02T06:12:12Z ----------------------------------------------------------------
can you use the same visualization utilities like bellow for results demonstration?
View / edit / reply to this conversation on ReviewNB
eaidova commented on 2024-07-02T06:12:12Z ----------------------------------------------------------------
text-to-image usually means generation of images by text description, probably in this description you mean text-to-image-retrieval. From other part of description I read several times, but can not understand how it differ from previous example?
Probably you need to make accent on passing text and image simultaneously and preprocessing them outside model
View / edit / reply to this conversation on ReviewNB
eaidova commented on 2024-07-02T06:12:13Z ----------------------------------------------------------------
Line #19. visual_inputs = processor(images=IMAGE_INPUTS)
if you add return_tensors="pt" you get torch tensors directly from processor and the next line will not be needed (this is preferable way to do that)
View / edit / reply to this conversation on ReviewNB
eaidova commented on 2024-07-02T06:12:14Z ----------------------------------------------------------------
Line #5. visualize_result(IMAGE_INPUTS[0], TEXT_INPUTS, int8_res_ov['logits_per_image'][0])
negative values when we speak about probability score may looks confusing, maybe it make sense to apply softmax or rename probability to similarity
@sbalandi can we focus on separated representation of text encoder and image encoder (convert it as 2 models) in this notebook instead of one common model?
The main advantage of jina-clip that it's suitable not only for text to image matching, but also support comparing text embeddings in comparision with clip models itself, so having them as separated models may be more universal, so I think we should emphasis this
View / edit / reply to this conversation on ReviewNB
eaidova commented on 2024-07-02T06:12:12Z ----------------------------------------------------------------
text-to-image usually means generation of images by text description, probably in this description you mean text-to-image-retrieval. From other part of description I read several times, but can not understand how it differ from previous example?
Probably you need to make accent on passing text and image simultaneously and preprocessing them outside model
removed here, leaved it only in description, where it was from original source
View / edit / reply to this conversation on ReviewNB
eaidova commented on 2024-07-03T06:53:45Z ----------------------------------------------------------------
Line #25. img_coco = Image.open("./data/coco.jpg")
I do not see this image used in other notebook parts, is it required?
It is used in gradio example
View / edit / reply to this conversation on ReviewNB
eaidova commented on 2024-07-03T06:53:46Z ----------------------------------------------------------------
Line #1. from pathlib import Path
for comparing sizes you need to make sure that int8 model exists (probably also need to move variables with paths outside cells with %%skipl magic, otherwise if skipping condition will be True, this variable will not exists) or mark such cells with %%skip too)
added %%skip not $to_quantize.value and int8_text/vision_model_path are moved to separate cells
View / edit / reply to this conversation on ReviewNB
eaidova commented on 2024-07-03T06:53:47Z ----------------------------------------------------------------
Line #3. print(f"Performance speed up: {fp16_latency / int8_latency:.3f}")
please add text encoder/image encoder in printed message. It is difficult to understand for which model performance is reported
added
View / edit / reply to this conversation on ReviewNB
eaidova commented on 2024-07-03T06:53:47Z ----------------------------------------------------------------
Line #7. emb1_res = compiled_text_model(text_inputs["input_ids"])
which model used in gradio? fp16 or int8? is there opportunity to select it?
fp16 by default, added checkbox to manage it
View / edit / reply to this conversation on ReviewNB
aleksandr-mokrov commented on 2024-07-03T09:11:35Z ----------------------------------------------------------------
Line #4. ov_text_model = ov.convert_model(model.text_model, example_input=text_inputs["input_ids"])
I've got an error here:
TracingCheckError Traceback (most recent call last) File ~/test_notebooks/jina-clip/openvino_notebooks/notebooks/jina-clip/venv/lib/python3.10/site-packages/openvino/frontend/pytorch/ts_decoder.py:41, in TorchScriptPythonDecoder.__init__(self, pt_module, graph_element, example_input, alias_db, shared_memory, skip_freeze, constant_cache, module_extensions) 40 try: ---> 41 pt_module = self._get_scripted_model( 42 pt_module, example_input, skip_freeze) 43 except Exception as e: File ~/test_notebooks/jina-clip/openvino_notebooks/notebooks/jina-clip/venv/lib/python3.10/site-packages/openvino/frontend/pytorch/ts_decoder.py:133, in TorchScriptPythonDecoder._get_scripted_model(self, pt_module, example_inputs, skip_freeze) 132 try: --> 133 scripted = torch.jit.trace( 134 pt_module, **input_parameters, strict=False) 135 finally: File ~/test_notebooks/jina-clip/openvino_notebooks/notebooks/jina-clip/venv/lib/python3.10/site-packages/torch/jit/_trace.py:820, in trace(func, example_inputs, optimize, check_trace, check_inputs, check_tolerance, strict, _force_outplace, _module_class, _compilation_unit, example_kwarg_inputs, _store_inputs) 819 raise RuntimeError("example_kwarg_inputs should be a dict") --> 820 return trace_module( 821 func, 822 {"forward": example_inputs}, 823 None, 824 check_trace, 825 wrap_check_inputs(check_inputs), 826 check_tolerance, 827 strict, 828 _force_outplace, 829 _module_class, 830 example_inputs_is_kwarg=isinstance(example_kwarg_inputs, dict), 831 _store_inputs=_store_inputs, 832 ) 833 if ( 834 hasattr(func, "__self__") 835 and isinstance(func.__self__, torch.nn.Module) 836 and func.__name__ == "forward" 837 ): File ~/test_notebooks/jina-clip/openvino_notebooks/notebooks/jina-clip/venv/lib/python3.10/site-packages/torch/jit/_trace.py:1116, in trace_module(mod, inputs, optimize, check_trace, check_inputs, check_tolerance, strict, _force_outplace, _module_class, _compilation_unit, example_inputs_is_kwarg, _store_inputs) 1115 else: -> 1116 _check_trace( 1117 [inputs], 1118 func, 1119 check_trace_method, 1120 check_tolerance, 1121 strict, 1122 _force_outplace, 1123 True, 1124 _module_class, 1125 example_inputs_is_kwarg=example_inputs_is_kwarg, 1126 ) 1127 finally: File ~/test_notebooks/jina-clip/openvino_notebooks/notebooks/jina-clip/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs) 114 with ctx_factory(): --> 115 return func(*args, **kwargs) File ~/test_notebooks/jina-clip/openvino_notebooks/notebooks/jina-clip/venv/lib/python3.10/site-packages/torch/jit/_trace.py:591, in _check_trace(check_inputs, func, traced_func, check_tolerance, strict, force_outplace, is_trace_module, _module_class, example_inputs_is_kwarg) 590 if any(info is not None for info in diag_info): --> 591 raise TracingCheckError(*diag_info) TracingCheckError: Tracing failed sanity checks! ERROR: Graphs differed across invocations! Graph diff: graph(%self.1 : __torch__.transformers_modules.jinaai.jina-clip-implementation.952897b38094b9f6a47b3d9a1d8239523e374098.hf_model.HFTextEncoder, %x.1 : Tensor):
And a long long diff below
fixed by calling directly text_model/vision_model , please, check again
fixed by calling directly text_model/vision_model , please, check again
View entire conversation on ReviewNB
added %%skip not $to_quantize.value and int8_text/vision_model_path are moved to separate cells
View entire conversation on ReviewNB
View / edit / reply to this conversation on ReviewNB
eaidova commented on 2024-07-04T11:13:37Z ----------------------------------------------------------------
>>how the life demo for zero-shot image classification task.
But actually demo not only for zero-shot image classification now, could you please update description?
sbalandi commented on 2024-07-04T11:21:34Z ----------------------------------------------------------------
clarifications removed
View / edit / reply to this conversation on ReviewNB
eaidova commented on 2024-07-04T11:13:38Z ----------------------------------------------------------------
looks like text formatting issue, text should be moved on next line after back on top
sbalandi commented on 2024-07-04T11:21:17Z ----------------------------------------------------------------
moved
CVS-145251