Ability to remotely parse metadata over small HTTP requests

julien-c commented 1 year ago

In this branch: https://github.com/huggingface/safetensors/compare/julien-c/js I pushed a proof-of-concept of how, given the simplicity of the format, one can fetch metadata about the weights over small (Range) HTTP requests.

The code is JS (can run in browsers or in Node) but it would be similar in any language.

Here's an example on how to fetch the header in a single file for instance:

async function parseSingleFile(url: URL): Promise<FileHeader> {
    const bufLengthOfHeaderLE = await (
        await fetch(url, {
            headers: {
                Range: "bytes=0-7",
            },
        })
    ).arrayBuffer();
    const lengthOfHeader = new DataView(bufLengthOfHeaderLE).getBigUint64(
        0,
        true
    );
    /// ^little-endian
    const header: FileHeader = await (
        await fetch(url, {
            headers: {
                Range: `bytes=8-${7 + Number(lengthOfHeader)}`,
            },
        })
    ).json();
    /// no validation for now, we assume it's a valid FileHeader.
    return header;
}

where a FileHeader type is defined as:

type TensorName = string;
type Dtype =
    | "F64"
    | "F32"
    | "F16"
    | "I64"
    | "I32"
    | "I16"
    | "I8"
    | "U8"
    | "BOOL";

interface TensorInfo {
    dtype: Dtype;
    shape: number[];
    data_offsets: [number, number];
}

type FileHeader = Record<TensorName, TensorInfo> & {
    __metadata__: Record<string, string>;
};

Results

As a fun first experiment, I compute the number of params per dtype for all models currently with a safetensors version on the HuggingFace Hub.

Here's the results:

model	safetensors	params
gpt2	single-file	{ 'F32' => 137022720 }
roberta-base	single-file	{ 'F32' => 124697433, 'I64' => 514 }
Jean-Baptiste/camembert-ner	single-file	{ 'F32' => 110035205, 'I64' => 514 }
roberta-large	single-file	{ 'F32' => 355412057, 'I64' => 514 }
bigscience/bloom-560m	single-file	{ 'F16' => 559214592 }
hf-internal-testing/tiny-random-bert-safetensors	single-file	{ 'F32' => 127463, 'I64' => 512 }
hf-internal-testing/tiny-random-bert-sharded-safetensors	index-file	{ 'F32' => 87929, 'I64' => 512 }
Narsil/small3	index-file	{ 'F32' => 59159, 'I64' => 512 }
Narsil/small2	single-file	{ 'F32' => 59159, 'I64' => 512 }
hf-internal-testing/tiny-random-bert-safetensors-tf	single-file	{ 'F32' => 87929 }

Thought it'd be fun to share! cc @mishig25 @osanseviero too

Narsil commented 1 year ago

Super nice !

Actually I just thought , for the initial read, you could probably initial the first request directly on the first 100ko maybe ? And refetch only if needed. This would avoid launching 2 network calls in most settings (100ko is adjustable).

Just an optimization that might be worthwhile in production

julien-c commented 1 year ago

Yes! I thought of that optimization too @Narsil. I'll probably implement it in a v2.

julien-c commented 1 year ago

Update for top 100 most downloaded models: (currently 2486 models have the safetensors tag)

model	safetensors	params
bert-base-uncased	single-file	{ 'F32' => 110106428 }
jonatasgrosman/wav2vec2-large-xlsr-53-english	single-file	{ 'F32' => 315472545 }
gpt2	single-file	{ 'F32' => 137022720 }
xlm-roberta-base	single-file	{ 'F32' => 278885778 }
roberta-base	single-file	{ 'F32' => 124697433, 'I64' => 514 }
distilbert-base-uncased	single-file	{ 'F32' => 66985530 }
t5-base	single-file	{ 'F32' => 222903936 }
xlm-roberta-large	single-file	{ 'F32' => 561192082 }
bert-base-multilingual-cased	single-file	{ 'F32' => 178566653 }
bert-base-cased	single-file	{ 'F32' => 108932934 }
distilroberta-base	single-file	{ 'F32' => 82760793 }
albert-base-v2	single-file	{ 'F32' => 11842272 }
roberta-large	single-file	{ 'F32' => 355412057, 'I64' => 514 }
distilbert-base-uncased-finetuned-sst-2-english	single-file	{ 'F32' => 66955010 }
facebook/bart-large-mnli	single-file	{ 'F32' => 407344133 }
t5-small	single-file	{ 'F32' => 60506880 }
deepset/roberta-base-squad2	single-file	{ 'F32' => 124056578, 'I64' => 514 }
distilbert-base-multilingual-cased	single-file	{ 'F32' => 135445755 }
bigscience/bloom-560m	single-file	{ 'F16' => 559214592 }
bert-base-chinese	single-file	{ 'F32' => 102882442 }
distilgpt2	single-file	{ 'F32' => 88204032 }
camembert-base	single-file	{ 'F32' => 111246085 }
Jean-Baptiste/camembert-ner	single-file	{ 'F32' => 110035205, 'I64' => 514 }
bert-large-uncased	single-file	{ 'F32' => 336226108 }
gpt2-medium	single-file	{ 'F32' => 379988992 }
cambridgeltl/SapBERT-from-PubMedBERT-fulltext	single-file	{ 'I64' => 512, 'F32' => 109482240 }
facebook/bart-base	single-file	{ 'F32' => 139420416 }
bert-large-uncased-whole-word-masking-finetuned-squad	single-file	{ 'F32' => 335143938 }
distilbert-base-uncased-distilled-squad	single-file	{ 'F32' => 66364418 }
gpt2-large	single-file	{ 'F32' => 811778816 }
mrm8488/t5-base-finetuned-common_gen	single-file	{ 'F32' => 296926848 }
openai-gpt	single-file	{ 'F32' => 119680512 }
t5-large	single-file	{ 'F32' => 737668608 }
d4data/biomedical-ner-all	single-file	{ 'F32' => 66427476 }
distilbert-base-cased-distilled-squad	single-file	{ 'F32' => 65192450 }
Jean-Baptiste/roberta-large-ner-english	single-file	{ 'I64' => 514, 'F32' => 354315269 }
prompthero/openjourney	single-file	{ 'F32' => 123060480, 'I64' => 77 }
GanjinZero/UMLSBert_ENG	single-file	{ 'I64' => 512, 'F32' => 109482240 }
google/flan-t5-base	single-file	{ 'F32' => 247577856 }
google/flan-t5-large	single-file	{ 'F32' => 783150080 }
roberta-base-openai-detector	single-file	{ 'F32' => 125237762 }
mrm8488/t5-base-finetuned-summarize-news	single-file	{ 'F32' => 222903936 }
google/flan-t5-xxl	sharded	{ 'F32' => 11266928640 }
bert-base-multilingual-uncased	single-file	{ 'F32' => 168055961 }
bert-large-cased	single-file	{ 'F32' => 334661958 }
mrm8488/bert-multi-cased-finetuned-xquadv1	single-file	{ 'F32' => 177854978 }
facebook/wav2vec2-base-960h	single-file	{ 'F32' => 94395552 }
oliverguhr/german-sentiment-bert	single-file	{ 'F32' => 109083651 }
malteos/scincl	single-file	{ 'I64' => 512, 'F32' => 109918464 }
Dizex/InstaFoodRoBERTa-NER	single-file	{ 'I64' => 514, 'F32' => 124058115 }
bert-large-uncased-whole-word-masking	single-file	{ 'F32' => 336226108 }
ltg/norbert2	single-file	{ 'I64' => 512, 'F32' => 125164986 }
shahrukhx01/question-vs-statement-classifier	single-file	{ 'I64' => 512, 'F32' => 11171074 }
facebook/esm2_t6_8M_UR50D	single-file	{ 'I64' => 1026, 'F32' => 7840842 }
pszemraj/flan-t5-large-grammar-synthesis	single-file	{ 'F32' => 783150080 }
bigscience/bloomz-560m	single-file	{ 'F16' => 559214592 }
roberta-large-mnli	single-file	{ 'F32' => 356412419 }
Gustavosta/MagicPrompt-Stable-Diffusion	single-file	{ 'F32' => 124439808, 'U8' => 12582912 }
human-centered-summarization/financial-summarization-pegasus	single-file	{ 'F32' => 568796007 }
finiteautomata/beto-emotion-analysis	single-file	{ 'I64' => 512, 'F32' => 109859335 }
voidful/albert_chinese_small	single-file	{ 'F32' => 4812936 }
mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis	single-file	{ 'I64' => 514, 'F32' => 82120707 }
mrm8488/t5-base-finetuned-question-generation-ap	single-file	{ 'F32' => 296926848 }
nbroad/ESG-BERT	single-file	{ 'I64' => 512, 'F32' => 109502234 }
impira/layoutlm-document-qa	single-file	{ 'I64' => 514, 'F32' => 127792898 }
bert-base-german-cased	single-file	{ 'F32' => 109705010 }
aubmindlab/bert-base-arabert	single-file	{ 'F32' => 135851010 }
deepset/tinyroberta-squad2	single-file	{ 'I64' => 514, 'F32' => 81529346 }
albert-base-v1	single-file	{ 'F32' => 11842272 }
beomi/kcbert-base	single-file	{ 'F32' => 109542194 }
Babelscape/wikineural-multilingual-ner	single-file	{ 'I64' => 512, 'F32' => 177269769 }
rinna/japanese-gpt-1b	single-file	{ 'F16' => 1327878144 }
setu4993/LaBSE	single-file	{ 'I64' => 512, 'F32' => 470926848 }
bigscience/bloom-1b1	single-file	{ 'F16' => 1065314304 }
sagorsarker/bangla-bert-base	single-file	{ 'F32' => 165092235 }
pszemraj/grammar-synthesis-small	single-file	{ 'F32' => 76961152 }
vicgalle/xlm-roberta-large-xnli-anli	single-file	{ 'I64' => 514, 'F32' => 559893507 }
typeform/distilbert-base-uncased-mnli	single-file	{ 'F32' => 66955779 }
distilbert-base-german-cased	single-file	{ 'F32' => 67431550 }
EleutherAI/gpt-neox-20b	sharded	{ 'F16' => 20554568208, 'U8' => 184549376 }
bigscience/bloom	sharded	{ 'BF16' => 176247271424 }
bigscience/bloom-3b	single-file	{ 'F16' => 3002557440 }
wavymulder/Analog-Diffusion	error	model id does not contain safetensors weights
FredZhang7/distilgpt2-stable-diffusion-v2	single-file	{ 'F32' => 81912576, 'U8' => 6291456 }
albert-xxlarge-v2	single-file	{ 'F32' => 223180256 }
cointegrated/rubert-tiny2	single-file	{ 'I64' => 2048, 'F32' => 29376502 }
KES/T5-KES	single-file	{ 'F32' => 222903552 }
cointegrated/LaBSE-en-ru	single-file	{ 'I64' => 512, 'F32' => 128993837 }
knkarthick/MEETING_SUMMARY	single-file	{ 'F32' => 406340696 }
rinna/japanese-roberta-base	single-file	{ 'I64' => 514, 'F32' => 110652416 }
xlm-clm-ende-1024	single-file	{ 'F32' => 208673979 }
oliverguhr/spelling-correction-english-base	single-file	{ 'F32' => 139470681 }
lidiya/bart-large-xsum-samsum	single-file	{ 'F32' => 406340696 }
dominguesm/bert-restore-punctuation-ptbr	single-file	{ 'I64' => 512, 'F32' => 108344079 }
patrickjohncyh/fashion-clip	single-file	{ 'I64' => 127, 'F32' => 151277312 }
mrm8488/bert-spanish-cased-finetuned-pos-16-tags	single-file	{ 'F32' => 109863953 }
MoritzLaurer/mDeBERTa-v3-base-mnli-xnli	single-file	{ 'I64' => 512, 'F16' => 278811651 }
blanchefort/rubert-base-cased-sentiment-rusentiment	single-file	{ 'I64' => 512, 'F32' => 177855747 }
elastic/distilbert-base-cased-finetuned-conll03-english	single-file	{ 'F32' => 65197833 }
cointegrated/rubert-tiny-toxicity	single-file	{ 'I64' => 512, 'F32' => 11785733 }

sparverius commented 9 months ago

@julien-c How is the canonical order of tensors reconstructed as seen here via huggingface.co/gpt2?show_tensors=true

The above example shows the first two tensor names aren't following lexicographical order (as intended) whereas the response returns the safetensor layout which is not in order... so does this mean the the information can be retrieved/exists somewhere programmatically!? 🙏🏼

julien-c commented 9 months ago

@sparverius that's a question for @mishig25 who implemented it, but yeah we have a few heuristics we use to order the layers on the frontend side – while the API exposes the logical on-disk order of the safetensors file (we had a lot of debate about this 🤣)

We can share some pseudo-code to demonstrate what we're doing on the frontend side maybe.

sparverius commented 9 months ago

but yeah we have a few heuristics we use to order the layers on the frontend side – while the API exposes the logical on-disk order of the safetensors file (we had a lot of debate about this 🤣)

Interesting, what were the main takeaways?

We can share some pseudo-code to demonstrate what we're doing on the frontend side maybe.

That would be awesome, thank you!

All thanks to the safetensors format, I have been working on a little side project building on this vision of summarizing/representing a given concrete model architecture textually and visually... hoping the results will be insightful to compare/diff models side-by-side or for gaining individual insight at a glance between models for a given task 🎨

mishig25 commented 9 months ago

@sparverius here is the heuristic to order the layers:

1. Split a layer name. The splitters/seperators are [".", "-", "_"]. Example: h.0.attn.c_proj.bias -> ["h", 0, "attn", "c_proj", "bias"]
2. Compare layername objects. If the current element is string, do lexiocographical order. If they are numbers, do numbers order. Ex: ["h", 0, "attn", "c_proj", "bias"] will rank higher than ["h", 1, "attn", "c_proj", "bias"] because 0 < 1 in their second elements
3. Use the below heauristic names/regexes (copied mostly from transformers naming convention), to "overwrite" the lexiocographical order

const REGEX_FIRST_LAYERS = /(embed|wte|wpe|shared)/i;
const REGEX_LAST_LAYERS = /(head|classifier)/i;

/*
Rules for comparing ParsedTensorInfo objects.
Examples:
* h.2.attn.c_proj.bias should order lower than h.11.attn.c_proj.bias because h.2 < h.11
* embedding.layer should order lower than h.2.attn.c_proj.bias because there is special susbtring "embedding"
*/

julien-c commented 9 months ago

summarizing/representing a given concrete model architecture textually and visually

This sounds super interesting, i'm sure many people would be interested in this

sparverius commented 9 months ago

Thanks @mishig25! Interesting, any other existing efforts to catalog different architectures?

sparverius commented 9 months ago

@julien-c thanks, I hope it will be useful!

Showing that one can retrieve model information from a safetensors checkpoint shows the beauty & transparency of the format and thankfully the inspiration for this side-project 🤗 ...

I'm cautious to depend upon this fact on a more widespread level since it costs precious requests for hf serverside (a few for larger sharded models think HuggingFaceM4/idefics-80b or even tiiuae/falcon-180B) and the whole ordering thing, even though it might be a good advertisement for safetensors...

Some thoughts from what i've been running into for one, utilizing config.json seems helpful for ordering

EXPAND for details on how that might be useful for transformers, diffusers, timm models

### for transformers Certain heuristics for encoder-only, encoder-decoder, decoder only ... Llama-2-7b-hf config.json gives us some insights on shapes, # layers, etc ```json { "architectures": [ "LlamaForCausalLM" ], ... "hidden_size": 4096, ... "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, ... "vocab_size": 32000 } ``` what about an architecture that hasn't been pulled into transformers lib yet microsoft/phi-1_5 gives estimate of tensor shapes etc, but doesn't tell us what each layer is composed of unless one looks at custom code... ```json { ... "architecture": { ... "block_cls": "parallel", ... }, "architectures": [ "MixFormerSequentialForCausalLM" ], "auto_map": { ... "AutoModelForCausalLM": ... }, ... "model_type": "mixformer-sequential", "n_embd": 2048, "n_head": 32, "n_inner": null, "n_layer": 24, "n_positions": 2048, ... "rotary_dim": 32, ... "vocab_size": 51200 } ``` perhaps https://huggingface.co/microsoft/phi-1_5/blob/main/tokenizer_config.json gives us a hint? ``` { ... "tokenizer_class": "CodeGenTokenizer", ... } ``` other interesting cases: [facebook/maskformer-swin-large-coco](https://huggingface.co/facebook/maskformer-swin-large-coco/tree/main) --- ### for timm: https://huggingface.co/timm/resnet50.a1_in1k/blob/main/config.json ```json { "architecture": "resnet50", ... "first_conv": "conv1", "classifier": "fc", ... } ```

--- ### for diffusers: it gets a bit more complicated https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/raw/main/model_index.json this one involves some pointer chasing ☞ ✦ ✧ ❂ 🏆 ```json { "_class_name": "StableDiffusionXLPipeline", "text_encoder": [ "transformers", "CLIPTextModel" ], ... "unet": [ "diffusers", "UNet2DConditionModel" ], "vae": [ "diffusers", "AutoencoderKL" ] } ``` --- ### Other Errata: Ambiguities arise: What is Big🐤? https://github.com/huggingface/transformers/blob/dcbfd93d7aeb14f8ff08a48866d2a68950d4c69a/templates/adding_a_new_model/open_model_proposals/ADD_BIG_BIRD.md?plain=1#L217-L228 🤔 https://github.com/huggingface/transformers/blob/dcbfd93d7aeb14f8ff08a48866d2a68950d4c69a/templates/adding_a_new_model/open_model_proposals/ADD_BIG_BIRD.md?plain=1#L256-L257

sparverius commented 9 months ago

TL;DR 🤖

Perhaps this discussion for this side-project is better suited for elsewhere though, I am wondering if a community effort in cataloging model summaries might be the best way ... all thoughts welcome 🤗 !

Always wanted a way to see model architecture info at a glance & the ability to remotely parse safetensors is the first ive seen as a way to do so
The labels in safetensors are not ordered and ordering with heuristics is helpful to some degree
Extending heuristics with other available information such as config.json might improve ordering
network requests to models is cheap but not free
caching results to file?

Problem Statement

How can one know the concrete architecture of a model at a glance without grokking paper, source code, or ultimately loading into memory?

Related existing tools:

### Text-based - model.summary() - torchinfo/torch-summary https://pypi.org/project/torchinfo/ ### Visual - tensorboard graphs [example]( https://tensorboard.dev/experiment/EDZb7XgKSBKo6Gznh3i8hg/#graphs&run=lr_1E-04%2Cconv%3D1%2Cfc%3D2) - https://github.com/paulgavrikov/visualkeras - Torchlens https://github.com/johnmarktaylor91/torchlens - pytorchviz https://github.com/szagoruyko/pytorchviz

Ideas

Tool to summarize model in particular format
- doesn't need to be complex just standardized
- condense repeated information ie. layers 1-24 are identical
- something akin to repr(model) or model.summary()
  
  Example

{
    "deberta": {
        "class": "DebertaV2Model",
        "embeddings": {
            "class": "DebertaV2Embeddings",
            "position_ids": "[1, 512]",
            "word_embeddings": {
                "class": "Embedding",
                "weight": "[128100, 768]",
            },
            ...
        },
        "encoder": {
            "class": "DebertaV2Encoder",
            "layer": {
                "class": "ModuleList",
                "N": {
                    "class": "DebertaV2Layer",
                    "attention": {
                        "class": "DebertaV2Attention",
                        "self": {
                            "class": "DisentangledSelfAttention",
                            "query_proj": {
                                "class": "Linear",
                                "weight": "[768, 768]", # deberta.encoder.layer.N.attention.self.query_proj.weight
                                "bias": "[768]", # deberta.encoder.layer.N.attention.self.query_proj.bias
                            },
                            "key_proj": { "class": "Linear", ... },
                            ... 
                            "pos_dropout": { "class": "StableDropout" },
                            "dropout": { "class": "StableDropout" }
                        },
                        "output": {
                            "class": "DebertaV2SelfOutput",
                            "dense": {
                                "class": "Linear",
                                "weight": "[768, 768]", # deberta.encoder.layer.N.attention.output.dense.weight
                                ...
                            }
                            ...
                        }
                    },
                    "intermediate": {
                        "class": "DebertaV2Intermediate",
                        "dense": { ... },
                        "intermediate_act_fn": { "class": "GELUActivation" }
                    },
                    ...
                },
                ..
            },
           "rel_embeddings": {
                "class": "Embedding",
                "weight": "[512, 768]",
            },
            "LayerNorm": { ... }
        }
    },
    "pooler": { "class": "ContextPooler", ... },
    "classifier": { ... },
    "dropout": { "class": "StableDropout" }
}

thinking of create a repo & associated pip package
1. allowing users to pip install dependency
2. something like it runs after saving checkpoint / before push_to_hub (or in process before converting safetensor)
3. runs a souped up model-summary
4. sends pull-request to repo OR pushes file to hub?
5. 🤩 just for fun: show lovely cli graphics think starship.rs because huggingface is fun!
or, find a workflow that works with hf hub?

Outcomes:

A catalog / repo / central place hosting model summaries

Discovery
Searchable hierarchy
Automagically get visualization for free
Automagically compare models
Promote insight & perhaps aid in research
could combine concrete architecture comparisons with benchmarks etc...

julien-c commented 9 months ago

Great summary, @sparverius!!

A catalog / repo / central place hosting model summaries

IMO the best would be to place each model summary into its model repo on the HF Hub (through a Pull request or Discussion)

Also makes me think a bit of https://huggingface.co/spaces/hf-accelerate/model-memory-usage by @muellerzr

@mishig25 do you remember if we had noted somewhere public some of our thoughts about how to encode model architecture into an easy to use file format?

sparverius commented 9 months ago

Great summary, @sparverius!!

Thanks!

IMO the best would be to place each model summary into its model repo on the HF Hub (through a Pull request or Discussion)

Good point, that would be the most accessible.

@mishig25 do you remember if we had noted somewhere public some of our thoughts about how to encode model architecture into an easy to use file format?

If you have that to share, that would be awesome!

Was thinking of something simple, a similar format as safetensors metadata, encoding intermediate labels as class names for example,

{
 "model": "Model",
 "model.embed_tokens": "Embedding",
 "model.embed_tokens.weight": {"shape": [32000, 4096], "dtype": "float16"},
 "model.layers": "ModuleList",
 "model.layers.0": "DecoderLayer",
 "model.layers.0.self_attn": "Attention",
 "model.layers.0.self_attn.rotary_emb": "RotaryEmbedding",
 "model.layers.0.self_attn.rotary_emb.inv_freq": {"shape": [64], "dtype": "float32"},
 "model.layers.0.self_attn.rotary_emb.cos_cached": {"shape": [1, 1, 4096, 128], "dtype": "float16"},
 "model.layers.0.self_attn.rotary_emb.sin_cached": {"shape": [1, 1, 4096, 128], "dtype": "float16"},
 "model.layers.0.self_attn.k_proj": "QuantLinear",
 "model.layers.0.self_attn.k_proj.qweight": {"shape": [512, 4096], "dtype": "int32"},
 "model.layers.0.self_attn.k_proj.qzeros": {"shape": [32, 512], "dtype": "int32"},
 "model.layers.0.self_attn.k_proj.scales": {"shape": [32, 4096], "dtype": "float16"},
 "model.layers.0.self_attn.k_proj.g_idx": {"shape": [4096], "dtype": "int32"},
 "model.layers.0.self_attn.k_proj.bias": {"shape": [4096], "dtype": "float16"},
 ...
 "model.layers.0.mlp": "MLP",
 "model.layers.0.mlp.act_fn": "SiLUActivation",
 ...
 "model.layers.0.mlp.up_proj": "QuantLinear",
 "model.layers.0.mlp.up_proj.qweight": {"shape": [512, 11008], "dtype": "int32"},
 "model.layers.0.mlp.up_proj.qzeros": {"shape": [32, 1376], "dtype": "int32"},
 "model.layers.0.mlp.up_proj.scales": {"shape": [32, 11008], "dtype": "float16"},
 "model.layers.0.mlp.up_proj.g_idx": {"shape": [4096], "dtype": "int32"},
 "model.layers.0.mlp.up_proj.bias": {"shape": [11008], "dtype": "float16"},
 "model.layers.0.input_layernorm": "RMSNorm",
 "model.layers.0.input_layernorm.weight": {"shape": [4096], "dtype": "float16"},
 "model.layers.0.post_attention_layernorm": "RMSNorm",
 "model.layers.0.post_attention_layernorm.weight": {"shape": [4096], "dtype": "float16"},
 ...
}

This has the advantage of being able to cross check against safetensors, complements safetensor metadata with a bit of added info and can easily be jsonized...

mishig25 commented 9 months ago

@mishig25 do you remember if we had noted somewhere public some of our thoughts about how to encode model architecture into an easy to use file format?

There was no public discussion. Internally, you've posted

Can ONNX refer to external weights, i.e. for instance could a ONNX file only represent the computation graph, but point to a safetensors file for the actual weights? (maybe through an extension)

mishig25 commented 9 months ago

btw @sparverius I assume you've seen this doc page https://huggingface.co/docs/safetensors/metadata_parsing ?

ThiloteE commented 1 month ago

@julien-c if I may ask, what is the method to calculate the parameter count of a model? I am thinking of maybe creating a script to detect differences in the parameter size as compared to model name. I know regex well, so the model name is no problem, but I am currently stuck at calculating the parameter size. Maybe I can find a way to clean up the mess on the huggingface open llm leaderboard.

julien-c commented 1 month ago

@ThiloteE we just sum the number of parameters in all the tensors

There's also a python implementation in case it is more readable: https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/utils/_safetensors.py

huggingface / safetensors