Investigating and benchmarking distributed filesystems for the European Galaxy server
Supervisor: Gianmauro Cuccuru, Compute Center Freiburg
For degree: Bachelor/Project/Master
Status: Open
Keywords: S3, NetApp, OneData, iRODS
Global Research context
With the Pulsar network (https://pulsar-network.readthedocs.io), we have a distributed compute infrastructure in places that can schedule compute jobs across the globe. The logical next step is to find a distributed storage component that is reliable and scalable across different clouds and HPC centers - and ultimately integrate this into the European Galaxy infrastructure.
Project context
S3, OneData and iRODS are three candidates that should be evaluated in the context of the European Galaxy server use case. We have access to all 3 technologies, OneData & iRODS even as a distributed European deployment. We are aiming in benchmarking those solutions and evaluate which one is the best for our use-case.
Proposed agenda for the project
develop an automatic benchmark procedure for S3, OneData and iRODS for a few typical use cases:
writing small files
writing big files
reading small files
reading big files
local filesystem
remote filesystems in other countries
different file-formats, hdf5 vs zarr vs. netcdf
check the failure tolerance, by running the automatic benchmarking and tearing down storage locations
Investigating and benchmarking distributed filesystems for the European Galaxy server
Supervisor: Gianmauro Cuccuru, Compute Center Freiburg For degree: Bachelor/Project/Master Status: Open Keywords: S3, NetApp, OneData, iRODS
Global Research context
With the Pulsar network (https://pulsar-network.readthedocs.io), we have a distributed compute infrastructure in places that can schedule compute jobs across the globe. The logical next step is to find a distributed storage component that is reliable and scalable across different clouds and HPC centers - and ultimately integrate this into the European Galaxy infrastructure.
Project context
S3, OneData and iRODS are three candidates that should be evaluated in the context of the European Galaxy server use case. We have access to all 3 technologies, OneData & iRODS even as a distributed European deployment. We are aiming in benchmarking those solutions and evaluate which one is the best for our use-case.
Proposed agenda for the project
Prerequisites