ukwa / ukwa-services

Deployment configuration for all UKWA services stacks.
Apache License 2.0
4 stars 5 forks source link

Use rclone for DC move-to-hdfs #110

Closed anjackson closed 1 year ago

anjackson commented 1 year ago

DC is filling up so fast, our usual approach doesn't work. Time to switch to using rclone and direct crawler-to-hdfs transfer. Should be faster as can use whole bandwidth and not compete with Gluster/VM traffic.

anjackson commented 1 year ago

This is now running fine, and uploads and downloads both saturate the 1Gbps/125MBps connection to the internal network. The fact that uploads and downloads don't overlap means we're still not going as fast as possible, but it's still a lot faster than before.

anjackson commented 1 year ago

The implementation is in https://github.com/ukwa/ukwa-services/blob/master/manage/airflow/dags/move_to_hdfs.py

It could use a bit of a clean-up of the docs, but it's running well.