jina-ai / clip-as-service

πŸ„ Scalable embedding, reasoning, ranking for images and sentences with CLIP
https://clip-as-service.jina.ai
Other
12.48k stars 2.07k forks source link

chore: draft release note v0.8.1 #858

Closed numb3r3 closed 2 years ago

numb3r3 commented 2 years ago

Release Note

This release contains 1 new feature, 1 performance improvement, 2 bug fixes and 4 documentation improvements.

πŸ†• Features

Allow custom callback in clip_client (#849)

This feature allows clip-client users to send a request to a server and then process the response with a custom callback function. There are three callbacks that users can process with custom functions: on_done, on_error and on_always.

The following code snippet shows how to send a request to a server and save the response to a database.

from clip_client import Client

db = {}

def my_on_done(resp):
    for doc in resp.docs:
        db[doc.id] = doc

def my_on_error(resp):
    with open('error.log', 'a') as f:
        f.write(resp)

def my_on_always(resp):
    print(f'{len(resp.docs)} docs processed')

c = Client('grpc://0.0.0.0:12345')
c.encode(
    ['hello', 'world'], on_done=my_on_done, on_error=my_on_error, on_always=my_on_always
)

For more details, please refer to the CLIP client documentation.

πŸš€ Performance

Integrate flash attention (#853)

We have integrated the flash attention module as a faster replacement for nn.MultiHeadAttention. To take advantage of this feature, you will need to install the flash attention module manually:

pip install git+https://github.com/HazyResearch/flash-attention.git

If flash attention is present, clip_server will automatically try to use it.

The table below compares CLIP performance with and without the flash attention module. We conducted all tests on a Tesla T4 GPU, and times how long it took to encode a batch of documents 100 times.

Model Input data Input shape w/o flash attention flash attention Speedup
ViT-B-32 text (1, 77) 0.42692 0.37867 1.1274
ViT-B-32 text (8, 77) 0.48738 0.45324 1.0753
ViT-B-32 text (16, 77) 0.4764 0.44315 1.07502
ViT-B-32 image (1, 3, 224, 224) 0.4349 0.40392 1.0767
ViT-B-32 image (8, 3, 224, 224) 0.47367 0.45316 1.04527
ViT-B-32 image (16, 3, 224, 224) 0.51586 0.50555 1.0204

Based on our experiments, performance improvements vary depending on the model and GPU, but in general, the flash attention module improves performance.

🐞 Bug Fixes

Increase timeout at startup for Executor docker images (#854)

During Executor initialization, it can take quite a lot of time to download model parameters. If a model is very large and downloading slowly, the Executor may fail due to time-out before even starting. We have increased the timeout to 3000000ms.

Install transformers for Executor docker images (#851)

We have added the transformers package to Executor docker images, in order to support the multilingual CLIP model.

πŸ“— Documentation Improvements

🀟 Contributors

We would like to thank all contributors to this release: