opencb / opencga

An Open Computational Genomics Analysis platform for big data genomics analysis. OpenCGA is maintained and develop by its parent company Zetta Genomics. Please contact support@zettagenomics.com for bug report and feature requests.
Apache License 2.0
166 stars 97 forks source link

Test and create vFXT NFS storage #931

Open lawrencegripper opened 5 years ago

lawrencegripper commented 5 years ago

Tasks:

lawrencegripper commented 5 years ago

Following the guides here I've manually created a cluster and tested I could mount NFS successfully.

Next my plan is to use a managed service identity and script extension to fully automate the cluster creation from ARM

lawrencegripper commented 5 years ago

Note: We should look at testing what happens if the compute nodes are delete can the cluster be recreated over the same storage account? When it is are existing files accessible still?

lawrencegripper commented 5 years ago

Currently I'm using the following to test a simple file copy throughput on avere

sudo mount -o hard,nointr,proto=tcp,mountproto=tcp,retry=30 10.0.0.22:/msazure /mnt/vfxt
dd if=/dev/zero of=/mnt/vfxt/t1.img bs=10G count=1

# Remount to prevent caching
sudo umount /mnt/vfxt
sudo mount -o hard,nointr,proto=tcp,mountproto=tcp,retry=30 10.0.0.22:/msazure /mnt/vfxt

time dd if=/mnt/vfxt/t1.img of=/dev/null bs=128k

# Remount to prevent caching
sudo umount /mnt/vfxt
sudo mount -o hard,nointr,proto=tcp,mountproto=tcp,retry=30 10.0.0.22:/msazure /mnt/vfxt

time dd if=/mnt/vfxt/t1.img of=/dev/null bs=128k

# Remount to prevent caching
sudo umount /mnt/vfxt
sudo mount -o hard,nointr,proto=tcp,mountproto=tcp,retry=30 10.0.0.22:/msazure /mnt/vfxt

time dd if=/mnt/vfxt/t1.img of=/dev/null bs=128k

rm /mnt/vfxt/t1.img
sudo umount /mnt/vfxt

The mount/unmount is required otherwise the client simply caches the files.

Running this against the newly created avere cluster from an A8 machine gives the following:

azureuser@vm-avere-controller:/mnt$ sudo mount -o hard,nointr,proto=tcp,mountproto=tcp,retry=30 10.0.0.22:/msazure /mnt/vfxt
azureuser@vm-avere-controller:/mnt$ dd if=/dev/zero of=/mnt/vfxt/t1.img bs=10G count=10+1 records in
0+1 records out
2147479552 bytes (2.1 GB, 2.0 GiB) copied, 13.3394 s, 161 MB/s
azureuser@vm-avere-controller:/mnt$ 
azureuser@vm-avere-controller:/mnt$ # Remount to prevent caching
azureuser@vm-avere-controller:/mnt$ sudo umount /mnt/vfxt
azureuser@vm-avere-controller:/mnt$ sudo mount -o hard,nointr,proto=tcp,mountproto=tcp,retry=30 10.0.0.22:/msazure /mnt/vfxt
azureuser@vm-avere-controller:/mnt$ 
azureuser@vm-avere-controller:/mnt$ time dd if=/mnt/vfxt/t1.img of=/dev/null bs=128k
16383+1 records in
16383+1 records out
2147479552 bytes (2.1 GB, 2.0 GiB) copied, 7.7473 s, 277 MB/s

real    0m7.750s
user    0m0.021s
sys 0m0.844s
azureuser@vm-avere-controller:/mnt$ 
azureuser@vm-avere-controller:/mnt$ # Remount to prevent caching
azureuser@vm-avere-controller:/mnt$ sudo umount /mnt/vfxt
azureuser@vm-avere-controller:/mnt$ sudo mount -o hard,nointr,proto=tcp,mountproto=tcp,retry=30 10.0.0.22:/msazure /mnt/vfxt
azureuser@vm-avere-controller:/mnt$ 
azureuser@vm-avere-controller:/mnt$ time dd if=/mnt/vfxt/t1.img of=/dev/null bs=128k
16383+1 records in
16383+1 records out
2147479552 bytes (2.1 GB, 2.0 GiB) copied, 7.71594 s, 278 MB/s

real    0m7.719s
user    0m0.023s
sys 0m0.827s

When run against the attached disk of the A8 the following is produced:

0+1 records in
0+1 records out
2147479552 bytes (2.1 GB, 2.0 GiB) copied, 5.50279 s, 390 MB/s
azureuser@vm-avere-controller:~$ 
azureuser@vm-avere-controller:~$ 
azureuser@vm-avere-controller:~$ time dd if=/mnt/t1.img of=/dev/null bs=128k
16383+1 records in
16383+1 records out
2147479552 bytes (2.1 GB, 2.0 GiB) copied, 0.996126 s, 2.2 GB/s

real    0m0.997s
user    0m0.010s
sys 0m0.988s
azureuser@vm-avere-controller:~$ 
azureuser@vm-avere-controller:~$ time dd if=/mnt/t1.img of=/dev/null bs=128k
16383+1 records in
16383+1 records out
2147479552 bytes (2.1 GB, 2.0 GiB) copied, 1.17016 s, 1.8 GB/s

real    0m1.172s
user    0m0.024s
sys 0m1.148s
lawrencegripper commented 5 years ago

This suggests that the throughput numbers for avere are Read: ~160 Write: ~275MB/sec

These numbers seem much lower than expected.

lawrencegripper commented 5 years ago

Current template will deploy a cluster but attempting to mount the storage on the controller results in the following error

lawrencegripper@vm-avere-controller:~$ sudo mount 10.0.1.12:/msazure /mnt/vfxt
mount.nfs: access denied by server while mounting 10.0.1.12:/msazure

Think this is likely something basic that I need to re-configure.

lawrencegripper commented 5 years ago

The existing work in the Avere repo outputs the IP's of the NFS servers created by the template, there are in the format 10.0.1.12-10.0.1.14.

I attempted to translate this into a CVS inside the ARM template but with the limited functions available couldn't, as all the components mounting the NFS share will do so in a script I think it's fair that this is done in these scripts. I'll make an additional work item to write this script for VMs which mount this storage. #948

lawrencegripper commented 5 years ago

ARM now working for automated creation with storage account and setup.