metabrainz / musicbrainz-docker

Docker Compose project for the MusicBrainz Server with replication, search, and development setup
https://musicbrainz.org/doc/MusicBrainz_Server/Setup
284 stars 72 forks source link

Info: The recommended minimum hardware for VM is no longer sufficient #271

Closed PeterCodar closed 3 months ago

PeterCodar commented 3 months ago

You recommend "Disk Space 200 GB" in https://github.com/metabrainz/musicbrainz-docker?tab=readme-ov-file#prerequisites

After a completely new installation of a minimal Ubuntu 22.04.4 (needs about 17 GB) and the complete installation of

as docker containers, you need at least 215 GB Disk Space for a new virtual machine.

Due to the increasing amount of new data (replication/dumps) I suggest a Disk Space around 250 GB for a new VM.

Disk Usage freshly installed MB docker including Search and MB - 212 3 GB Disk Space

PeterCodar commented 3 months ago

After a second run - to get the newest available datadump files and import it - the needed Disk Space increased by more then 107 GB until the step "Done loading search indexes":

2024 03 22 11-19-54 - MB2024 - Load indexes DONE - 319 8 GB

Is this the expected increase of Disk Space during the download, extraction and import of the newest datadumps?

So the (temporary) Disk Space required would be more then 320 GB?

yvanzo commented 3 months ago

Hi @PeterCodar,

Thank you for reporting this issue. @reosarevok is currently installing a fresh local mirror to check the actual disk usage more in details.

The size of the database is expected to increase. We usually update the requirements for the schema change in May. It is possible that the size of the database increased faster than we anticipated.

Also, the scripts never remove the downloaded dump files, but override previous dump files with newer dump files when downloading dump files again to allow resuming download. That might explain a temporary larger disk usage. You can try removing all the dump files as follows:

docker-compose exec musicbrainz find /media/ -type f -exec rm -f '{}' ';'
PeterCodar commented 3 months ago

Thanks for the hint with the removal of all dump files. Unfortunately, this command does not seem to work (no error appears), the disk space is still over 328 GB.

image

yvanzo commented 3 months ago

You can further check disk usage of the main directories used to store dumps and data as follows:

docker-compose exec musicbrainz du -sh /media/dbdump /media/searchdump
docker-compose exec search du -sh . /opt/solr/server/solr/data
docker-compose exec db du -sh /var/lib/postgresql/data
PeterCodar commented 3 months ago

These are the current values:

sudo docker-compose exec musicbrainz find /media/ -type f -exec rm -f '{}' ';' -> Cleans about 60 GB dump data

sudo docker-compose exec musicbrainz du -sh /media/dbdump /media/searchdump 8.0K /media/dbdump 4.0K /media/searchdump

sudo docker-compose exec search du -sh . /opt/solr/server/solr/data 97.6G . 97.4G /opt/solr/server/solr/data

sudo docker-compose exec db du -sh /var/lib/postgresql/data 54G /var/lib/postgresql/data

PeterCodar commented 3 months ago

After a third run and without removing the dump files in /media manually (first command above) the Disk Usage looks like this:

sudo docker-compose exec musicbrainz du -sh /media/dbdump /media/searchdump 5.7G /media/dbdump 45G /media/searchdump

sudo docker-compose exec search du -sh . /opt/solr/server/solr/data 97.8G . 97.6G /opt/solr/server/solr/data

sudo docker-compose exec db du -sh /var/lib/postgresql/data 48G /var/lib/postgresql/data

What I don't get is why the total amount of Disk Space is about 330.6 GB currently: image All the above values add up to about 196.3 GB. Where are the remaining 134.3 GB located?

yvanzo commented 3 months ago

Where are the remaining 134.3 GB located?

It might be previous images and containers if you didn’t clean up for a long time.

sudo docker system df

When all of the Docker containers you need are running, do some cleanup as follows:

sudo docker system prune --all # remove all unused images

Avoid --all if some of the containers you need are not running at the moment. See the reference doc.

yvanzo commented 3 months ago

I updated the disk usage requirements in commit https://github.com/metabrainz/musicbrainz-docker/commit/ab2d835bc192c7bcdebc3da2be44d20d199e4d92 following you recommendations, thank you. Did the clean up resolve your issue?

PeterCodar commented 3 months ago

Thanks @yvanzo

Yes, your clean-up commands (and surprisingly a reboot of the VM) reduced the disk usage. It seems that about 105 GB will be cleaned "automatically" after every download, import AND reboot of the VM.

I'm not sure if a clean-up starts on every MB docker container start or if my Virtual Machine software cleans some temporary files (downloads?) after a reboot or even Ubuntu includes such clean-up tasks.

The most important thing is that people know that they need - at least temporarily - more then 200 GB disk space.