Closed anjackson closed 1 year ago
This is now running fine, and uploads and downloads both saturate the 1Gbps/125MBps connection to the internal network. The fact that uploads and downloads don't overlap means we're still not going as fast as possible, but it's still a lot faster than before.
The implementation is in https://github.com/ukwa/ukwa-services/blob/master/manage/airflow/dags/move_to_hdfs.py
It could use a bit of a clean-up of the docs, but it's running well.
DC is filling up so fast, our usual approach doesn't work. Time to switch to using rclone and direct crawler-to-hdfs transfer. Should be faster as can use whole bandwidth and not compete with Gluster/VM traffic.