hyparam / hyperparam-cli

Hyperparam local dataset viewer
https://hyperparam.app
MIT License
2 stars 0 forks source link

The file size is innacurate in TextView and MarkdownView #4

Closed severo closed 3 weeks ago

severo commented 1 month ago

In TextView, we show the number of characters as the number of bytes:

https://github.com/hyparam/hyperparam-cli/blob/ca429b826bc33a8aa6173a8a4a9a4ba58cc16008/src/components/viewers/TextView.tsx#L56

It's not exact, and the Content-Length header returns a different value. We might want to use parseFileSize as in the other views.

severo commented 3 weeks ago

see https://github.com/severo/hyparam--space/commit/6c21c8846d5f62697ac0fcf80494f2d4e2d2e183. I'll report in the other copies of the code

severo commented 3 weeks ago

(see https://huggingface.co/datasets/swaption2009/20k-en-zh-translation-pinyin-hsk/blob/main/hsk_1_4.txt for example: text.length is 3824730, content-length header is 4889584, we now use the latter)