rishikanthc / Scriberr

Self-hosted AI audio transcription
https://scriberr.app
MIT License
389 stars 16 forks source link

Docker image bloat #13

Open phillipjf opened 1 week ago

phillipjf commented 1 week ago

I pulled down the latest image (beta-0.2) expecting it to match what I had built locally (#11) but it seems like there's some additional files being included. It looks like you are building from your local repo which has some extra directories and files that are bumping up the image size (specifically samples with >100MB of audio files). Can you pull a fresh clone of the repo and re-publish an image? I would like to look at adding some github actions to build and publish if you'd be interested.

REPOSITORY                     TAG             IMAGE ID       CREATED        SIZE
ghcr.io/rishikanthc/scriberr   beta-0.2        80f187cb6faf   21 hours ago   3.23GB
scriberr-scriberr              latest          1877ad467e05   38 hours ago   1.33GB
docker run -it ghcr.io/rishikanthc/scriberr:beta-0.2 /bin/sh
/app # du -h -d1
201.5M  ./node_modules
7.6M    ./.svelte-kit
4.2M    ./.git
1.2M    ./pb_data
735.5M  ./whisper.cpp
196.0K  ./src
10.2M   ./build
5.8M    ./static
719.2M  ./whisper.cpp-master
32.0K   ./scriberr_files
181.5M  ./samples
1.8G    .

versus local:

$ docker run -it scriberr-scriberr:latest /bin/sh
/app # du -h -d1
199.1M  ./node_modules
160.0K  ./.svelte-kit
2.9M    ./.git
184.0K  ./src
5.8M    ./static
208.3M  .
rishikanthc commented 1 week ago

Sure will recompile it later today. Github actions for build and publish would be a great way to go forward. Feel free to contribute. Let me know if you need any permissions from me.