jina-ai / clip-as-service

πŸ„ Scalable embedding, reasoning, ranking for images and sentences with CLIP
https://clip-as-service.jina.ai
Other
12.48k stars 2.07k forks source link

feat: add faster tokenizer #868

Closed OrangeSodahub closed 2 years ago

OrangeSodahub commented 2 years ago

Add a faster tokenizer (original from rust-tokenizer).

github-actions[bot] commented 2 years ago

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

codecov[bot] commented 2 years ago

Codecov Report

Merging #868 (37cf884) into main (67f551c) will decrease coverage by 3.42%. The diff coverage is 21.95%.

@@            Coverage Diff             @@
##             main     #868      +/-   ##
==========================================
- Coverage   80.28%   76.85%   -3.43%     
==========================================
  Files          22       23       +1     
  Lines        1633     1672      +39     
==========================================
- Hits         1311     1285      -26     
- Misses        322      387      +65     
Flag Coverage Ξ”
cas 76.85% <21.95%> (-3.43%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Ξ”
server/clip_server/model/fast_tokenizer.py 11.42% <11.42%> (ΓΈ)
server/clip_server/model/tokenization.py 86.48% <83.33%> (-1.40%) :arrow_down:
server/clip_server/model/trt_utils.py 56.04% <0.00%> (-27.48%) :arrow_down:
server/clip_server/model/clip_trt.py 69.38% <0.00%> (-16.33%) :arrow_down:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

ZiniuYu commented 2 years ago

η‰›ε•Š

image
numb3r3 commented 2 years ago

Since rust-tokenizer is not pip installable. I would close this PR atm.