Increase the time for Galaxy cleanup again

usegalaxy-eu / infrastructure-playbook

Ansible playbook for managing UseGalaxy.eu infrastructure.

MIT License

16 stars 92 forks source link

Increase the time for Galaxy cleanup again #1195

Closed bgruening closed 4 months ago

bgruening commented 4 months ago

I don't know why we decreased it 5 years ago.

If we purge datasets older than 60days, I think we can do it less frequently and maybe avoid the long-running, maybe overlapping transactions.

A counterargument would be that once every 2 days more datasets need to be deleted at once, producing more IO spikes? I don't know.

@sanjaysrikakulam this job should run on the maintenance node.

bgruening commented 4 months ago

I assume it takes even longer than 1h.


root@sn06:~$ /usr/bin/env GDPR_MODE=1 PGUSER=galaxy PGHOST=sn05.galaxyproject.eu GALAXY_ROOT=/opt/galaxy/server GALAXY_CONFIG_FILE=/opt/galaxy/config/galaxy.yml GALAXY_LOG_DIR=/var/log/galax
y GXADMIN_PYTHON=/opt/galaxy/venv/bin/python /usr/bin/gxadmin galaxy cleanup 60

cleanup_datasets,group=delete_userless_histories success=1,runtime=7
cleanup_datasets,group=delete_exported_histories success=1,runtime=7
cleanup_datasets,group=purge_deleted_users success=1,runtime=4
cleanup_datasets,group=purge_deleted_histories success=1,runtime=1440
cleanup_datasets,group=purge_deleted_hdas success=1,runtime=3510

cleanup_datasets,group=purge_historyless_hdas success=1,runtime=8003
cleanup_datasets,group=purge_hdas_of_purged_histories success=1,runtime=2472
cleanup_datasets,group=delete_datasets success=1,runtime=612
cleanup_datasets,group=purge_datasets success=1,runtime=197

bgruening commented 4 months ago

galaxy GXADMIN_PYTHON=/opt/galaxy/venv/bin/python /usr/bin/gxadmin galaxy cleanup 60
cleanup_datasets,group=delete_userless_histories success=1,runtime=9
cleanup_datasets,group=delete_exported_histories success=1,runtime=6
cleanup_datasets,group=purge_deleted_users success=1,runtime=5
cleanup_datasets,group=purge_deleted_histories success=1,runtime=7971
cleanup_datasets,group=purge_deleted_hdas success=1,runtime=3147
cleanup_datasets,group=purge_historyless_hdas success=1,runtime=8126
cleanup_datasets,group=purge_hdas_of_purged_histories success=1,runtime=1334
cleanup_datasets,group=delete_datasets success=1,runtime=552
cleanup_datasets,group=purge_datasets success=1,runtime=57

real    353m26.402s
user    0m42.088s
sys     0m28.040s

We need to increase the timeout ... outch.

sanjaysrikakulam commented 4 months ago

Yes, we can migrate this to the maintenance node and update the bashrc if necessary to export a GALAXY_LOG_DIR. The gxadmin command seems to create logs .

This also means we must configure a logrotate to clean up those log files on the maintenance node.

I will get back to this once the maintenance node is back online.

hexylena commented 4 months ago

gxadmin command seems to create logs .

I would be open to it logging to stderr / journald, but it's because that's how the script in galaxy works, PRs welcome!

bgruening commented 4 months ago

Please merge if you think it's fine. Im still debugging our posters problems.

sanjaysrikakulam commented 4 months ago

Cool, I am trying to catch up on things. Also, I think this is fine. We can test it, and if we find any side effects, then we can reduce the interval again like once a day or so.

sanjaysrikakulam commented 4 months ago

I just found out that the task was deployed via this. So the interval must be updated in the group vars instead.

I will create a PR reflecting yours shortly.