Release Note

This release contains 1 new feature, 1 performance improvement, 2 bug fixes and 4 documentation improvements.

🆕 Features

Allow custom callback in `clip_client` (#849)

This feature allows clip-client users to send a request to a server and then process the response with a custom callback function. There are three callbacks that users can process with custom functions: on_done, on_error and on_always.

The following code snippet shows how to send a request to a server and save the response to a database.

from clip_client import Client

db = {}

def my_on_done(resp):
    for doc in resp.docs:
        db[doc.id] = doc

def my_on_error(resp):
    with open('error.log', 'a') as f:
        f.write(resp)

def my_on_always(resp):
    print(f'{len(resp.docs)} docs processed')

c = Client('grpc://0.0.0.0:12345')
c.encode(
    ['hello', 'world'], on_done=my_on_done, on_error=my_on_error, on_always=my_on_always
)

For more details, please refer to the CLIP client documentation.

🚀 Performance

Integrate flash attention (#853)

We have integrated the flash attention module as a faster replacement for nn.MultiHeadAttention. To take advantage of this feature, you will need to install the flash attention module manually:

pip install git+https://github.com/HazyResearch/flash-attention.git

If flash attention is present, clip_server will automatically try to use it.

The table below compares CLIP performance with and without the flash attention module. We conducted all tests on a Tesla T4 GPU, and times how long it took to encode a batch of documents 100 times.

Model	Input data	Input shape	w/o flash attention	flash attention	Speedup
`ViT-B-32`	text	(1, 77)	0.42692	0.37867	1.1274
`ViT-B-32`	text	(8, 77)	0.48738	0.45324	1.0753
`ViT-B-32`	text	(16, 77)	0.4764	0.44315	1.07502
`ViT-B-32`	image	(1, 3, 224, 224)	0.4349	0.40392	1.0767
`ViT-B-32`	image	(8, 3, 224, 224)	0.47367	0.45316	1.04527
`ViT-B-32`	image	(16, 3, 224, 224)	0.51586	0.50555	1.0204

Based on our experiments, performance improvements vary depending on the model and GPU, but in general, the flash attention module improves performance.

🐞 Bug Fixes

Increase timeout at startup for Executor docker images (#854)

During Executor initialization, it can take quite a lot of time to download model parameters. If a model is very large and downloading slowly, the Executor may fail due to time-out before even starting. We have increased the timeout to 3000000ms.

Install transformers for Executor docker images (#851)

We have added the transformers package to Executor docker images, in order to support the multilingual CLIP model.

📗 Documentation Improvements

Update Finetuner docs (#843)
Add tips for client parallelism usage (#846)
Move benchmark conclusion to beginning (#847)
Add instructions for using clip server hosted by Jina (#848)

🤟 Contributors

We would like to thank all contributors to this release:

Ziniu Yu (@ZiniuYu)
Jie Fu (@jemmyshin)
felix-wang (@numb3r3)
YangXiuyu (@OrangeSodaHub)

jina-ai / clip-as-service

chore: draft release note v0.8.1 #858

Release Note

🆕 Features

Allow custom callback in `clip_client` (#849)

🚀 Performance

Integrate flash attention (#853)

🐞 Bug Fixes

Increase timeout at startup for Executor docker images (#854)

Install transformers for Executor docker images (#851)

📗 Documentation Improvements

🤟 Contributors

jina-ai / clip-as-service

chore: draft release note v0.8.1 #858

Release Note

🆕 Features

Allow custom callback in clip_client (#849)

🚀 Performance

Integrate flash attention (#853)

🐞 Bug Fixes

Increase timeout at startup for Executor docker images (#854)

Install transformers for Executor docker images (#851)

📗 Documentation Improvements

🤟 Contributors

Allow custom callback in `clip_client` (#849)