jhc13 / taggui

Tag manager and captioner for image datasets
GNU General Public License v3.0
692 stars 32 forks source link

Proposal for a model database #233

Open geroldmeisinger opened 3 months ago

geroldmeisinger commented 3 months ago

Inspired by Ollama create a model library. A organization (university, company, institute..) develops a model architecture/family (like Llava 1.6). For a model architecture they publish a paper, code and model weights. Each model family is typically designed for a specific task and they come in different checkpoints/variants/flavors within a model zoo. Each checkpoint has factual information (date, urls, parameter size, disk size, encoder used). For each checkpoint we may also derive additional information and add it as metadata over time (like benchmark values or VRAM group). A specific checkpoint might be copied/fine-tuned/quantified and refers to it's parent model.

update: I documented most models here https://github.com/jhc13/taggui/discussions/169#discussioncomment-9951043

ollama_overview

model_family

id: autoincrement
prettyname: str
shortname: str
licence: str # some models use known licences ("mit", "apache"), some have their own (like "cogvlm2"). we also need this information for non-vlm models.
description: str
organization: str
homepage_url: url
paper_urls: list[url]
code_urls: list[url]
demo_urls: list[url]
model_urls: list[url]
tags: list[str] # e.g. 7B, 13B, object_detection
citation: str # Bibtex references
is_vlm: bool # A checkpoint uses a vision encoder and LLM architecture which we refer to by a model family which has is_vlm=false (e.g. llava-v1.6-vicuna-7b).

model_checkpoint

id: autoincrement
family_id: id
parent_id: id # if it's derived from another checkpoint
prettyname: str
description: str
parameters: int # visual encoder plus LLM
repo_url: url
repo_commit_hash: str
repo_last_update_date: date
modelfile_url: url # e.g. https://ollama.com/library/llava/blobs/170370233dd5
is_official: str # as in official by model creators
languages: list[str]
text_length: int
prompt_default: str
prompt_values: list[str] # like florence only allows <CAPTION>, <DETAILED_CAPTION>, <MORE_DETAILED_CAPTION>, <OCR>
licence: str # fine-tunes may change the licence
img_size_default: tuple[int] # 1344x1344
img_sizes: str # some models support multiples sizes or a range of sizes. use arbitrary str.
published_date_first: date
download_size: int
disk_size: int

model_checkpoint_meta

id: autoincrement
model_id: id
key: str
value: str

examples: benchmark_llava_bench, vram_size, vision_encoder_family_id, vision_encoder_parameters, vision_encoder_quantization, large_language_model_family_id, large_language_model_parameters, large_language_model_quantization

system

id: autoincrement
system_name: str # Debian Linux
system_version: str # 12.0
gpu_models: list[str] # ["Nvidia GeForce RTX 3060", "Nvidia GeForce RTX 3060"]
driver_version: str # "545.23.08"
compute_version: str # "12.3"

benchmark_entry

id: autoincrement
checkpoint_id: id
system_id: id
user_id: str
taggui_commit_hash: str
checkpoint_loadingtime_ms: int
images_url: list[url]
images_sha256: list[str]
images_gentime_ms: list[int]
images_output: list[str]
env: str # "PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True"
vram: int # in bytes
sram: int # in bytes
power_consumption_kWh: float # some benchmarks are interested in iterations/kWh to get the cheapest GPU per iteration
parameters: json # only the overrides for default

Llava Intro llava_intro

CogVLM intro

cogvlm_intro

Llava Bibtex citation

llava_citation

Llava in Ollama ollama_llava

Llava Model zoo llava_modelzoo

Llava benchmark spider diagram llava_benchmarks

Open VLM leaderboard

openvlm_leaderboard

CogVLM benchmarks cogvlm_benchmarks

Userbenchmark percentile example userbenchmark