unum-cloud / uform

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
https://unum-cloud.github.io/uform/
Apache License 2.0
982 stars 56 forks source link

[Refactor] Modular package organisation, pre-commit linting suite #58

Closed lmmx closed 6 months ago

lmmx commented 6 months ago

I wanted to review the code here to see how it worked, and I did some refactoring so I could read it more easily.

$ tree src/uform
src/uform
├── chat.py
├── gen_model.py
├── __init__.py
├── models
│   ├── encoders
│   │   ├── __init__.py
│   │   ├── network_layers.py
│   │   ├── text
│   │   │   ├── block.py
│   │   │   ├── encoder.py
│   │   │   └── __init__.py
│   │   └── visual
│   │       ├── block.py
│   │       ├── encoder.py
│   │       └── __init__.py
│   ├── image_utils.py
│   ├── __init__.py
│   ├── triton.py
│   └── vlm.py
└── setup_model.py

4 directories, 16 files
ashvardanian commented 6 months ago

Looks very nice, thank you @lmmx! I am pretty sure there will be collisions with #57, so give us a bit of time to find the optimal way to merge it 🤗

In the meantime, if you have any other recommendations about this or other repos I maintain (like SimSIMD), please let us know!

lmmx commented 6 months ago

Ah I just got the benchmarks to run and looks like I have made a mistake, I'm getting a warning from running the bench script (whereas no warning from the PyPI packaged version).

Click to show warning ``` (uform) louis 🌟 ~/lab/uform/uform $ python scripts/bench.py UForm-Gen Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3.86it/s] Some weights of the model checkpoint at unum-cloud/uform-gen were not used when initializing VLMForCausalLM: ['image_encoder.blocks.1.ls2.weight', 'image_encoder.blocks.9.ls1.weight', 'image_encoder.blocks.4.ls2.weight', 'image_encoder.blocks.4.ls1.weight', 'image_encoder.blocks.1.ls1.weight', 'image_encoder.blocks.7.ls1.weight', 'image_encoder.blocks.6.ls2.weight', 'image_encoder.blocks.10.ls2.weight', 'image_encoder.blocks.3.ls1.weight', 'image_encoder.blocks.0.ls1.weight', 'image_encoder.blocks.5.ls1.weight', 'image_encoder.blocks.6.ls1.weight', 'image_encoder.blocks.8.ls1.weight', 'image_encoder.blocks.2.ls2.weight', 'image_encoder.blocks.11.ls2.weight', 'image_encoder.blocks.3.ls2.weight', 'image_encoder.blocks.2.ls1.weight', 'image_encoder.blocks.7.ls2.weight', 'image_encoder.blocks.11.ls1.weight', 'image_encoder.blocks.10.ls1.weight', 'image_encoder.blocks.0.ls2.weight', 'image_encoder.blocks.9.ls2.weight', 'image_encoder.blocks.5.ls2.weight', 'image_encoder.blocks.8.ls2.weight'] - This IS expected if you are initializing VLMForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing VLMForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of VLMForCausalLM were not initialized from the model checkpoint at unum-cloud/uform-gen and are newly initialized: ['image_encoder.blocks.0.ls1.gamma', 'image_encoder.blocks.1.ls2.gamma', 'image_encoder.blocks.5.ls2.gamma', 'image_encoder.blocks.2.ls1.gamma', 'image_encoder.blocks.7.ls2.gamma', 'image_encoder.blocks.1.ls1.gamma', 'image_encoder.blocks.3.ls2.gamma', 'image_encoder.blocks.4.ls2.gamma', 'image_encoder.blocks.3.ls1.gamma', 'image_encoder.blocks.10.ls2.gamma', 'image_encoder.blocks.2.ls2.gamma', 'image_encoder.blocks.0.ls2.gamma', 'image_encoder.blocks.6.ls1.gamma', 'image_encoder.blocks.7.ls1.gamma', 'image_encoder.blocks.10.ls1.gamma', 'image_encoder.blocks.6.ls2.gamma', 'image_encoder.blocks.5.ls1.gamma', 'image_encoder.blocks.8.ls2.gamma', 'image_encoder.blocks.9.ls1.gamma', 'image_encoder.blocks.4.ls1.gamma', 'image_encoder.blocks.8.ls1.gamma', 'image_encoder.blocks.9.ls2.gamma', 'image_encoder.blocks.11.ls2.gamma', 'image_encoder.blocks.11.ls1.gamma'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. ```

Will put back to draft until I find. I'll take a look thanks Ash!

Update it was the LayerScale class in gen_model.py, I presumed it could be substituted for the other definition but seemingly not! ab47953 (the checkpoint at unum-cloud/uform-gen expects a layer named weight rather than gamma). I put it in the same dataclass form 79872fd so at least the 2 classes are similar (maybe worth renaming them to distinguish explicitly). Returned this PR to ready for review.

ashvardanian commented 6 months ago

Hi @lmmx! Thank you for the contribution!

We've looked into it with a team and can't merge it in its entirety right now, as it will introduce structural differences between public UForm and our private training repositories.

That said, the CI upgrades look interesting! Would you be open to reverting the structural changes and keeping just the CI?

Thanks again!

lmmx commented 6 months ago

Fair enough, sure I will cancel this PR and make a fresh one (#62) :inbox_tray: