wandb / server

W&B Server is the self hosted version of Weights & Biases
MIT License
262 stars 21 forks source link

How to free the data storage for deleted runs? #114

Closed SJoJoK closed 1 year ago

SJoJoK commented 1 year ago

Hi wandb Server Team,

Thanks for you gread work on the local server!

I found that the data of deleted runs are not removed in /vol of the container, is there any decent way (rather than rm ) to free these space?

Thanks in advance for any help or advice.

thanos-wandb commented 1 year ago

Hi @SJoJoK thanks for writing in, and glad you enjoy our product! You're correct, the data will be soft-deleted, ie won't be visible in the App but still stored in your database and object store.

May I please ask if these runs that you would want to delete contain only logged metrics or artifacts too? Also, what's your current configuration of your local instance? Have you connected it to an external MySQL and another object store than Minio (eg AWS, GCP bucket)?

Furthermore, are you the admin of your instance? because if you wanted to delete these you will need to log in your MySQL db and perform some queries there.

SJoJoK commented 1 year ago

Hello @thanos-wandb , thanks for replying.

I just run wandb server start without any other specific configs. The runs i would want to delete contains some logged 3D objects thus taking a lot space.

I'm the admin of the instance (In practice, I'm the only user). So I remove these data by

docker exec -it wandb-local /bin/bash
cd /vol/minio/local-files/[username]/[project name] 
rm -rdf [id of deleted runs]

But the IDs are not easily distinguished, so I'm seeking a more decent way to remove the data of deleted run.

Thanks again for your help.

thanos-wandb commented 1 year ago

Hi @SJoJoK thanks for the additional details, may I please ask what's the file format you're using for the 3D objects? It seems you're storing these as artifacts to your Object Storage (Minio). Unfortunately, it isn't currently possible to delete artifacts from the UI, although being requested and can add you to the internal ticket.

Is the reason to delete runs to free up hard disk space in your machine? If you're uploading large artifacts it might be that you will need to free up your local .cache directory too. Also, what's the reason that you can't distinguish the run IDs, there might be few options here using the API, such as tagging runs and then get their ids.

SJoJoK commented 1 year ago

Thanks for replying @thanos-wandb , the file format of the 3D objects is .obj . And yes, I want to free up hard disk space in my machine (I found it hard to mount \vol in other directory correctly , so I temporarily use the default setting, that is, \vol is now mounted on my system disk where docker installed rather than the data disk. This may be another issue but I can't spare the time to deal with it recently ). Since I‘m kinda busy now, I haven't check the provided API, it will be helpful if I can get all the ids of tagged runs.

Anyway, thanks for your help. I'll focus on dealing with my local server once I'm available.

thanos-wandb commented 1 year ago

Hi @SJoJoK thanks for the update, there doesn't seem to be another way at the moment to hard-delete your media. However, since you wanted to easier discover your runs, you could use Tags (from App or API/SDK too) and then get their IDs from the API as follows:

import wandb
api = wandb.Api()
runs = api.runs("username/project", filters={"tags": {"$in": ["TAG_NAME"]}})
for r in runs:
  print(r.id)

I hope this helps! I will temporarily close this ticket for now since you've mentioned you won't be available anytime soon, but please feel free to reopen this once you had time to resume to this, and we will be happy to keep investigating if you had any additional questions.