nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.61k stars 605 forks source link

Add support for Chinese cloud providers #3045

Open bentsherman opened 1 year ago

bentsherman commented 1 year ago

There is increased interest (#2149, #2795, #3043) for running Nextflow on cloud providers based in China:

While these platforms can be used now through Kubernetes, it would be more convenient for most users to be able to use the cloud provider directly.

Executors for these clouds can likely be adapted from the existing ones (e.g. AWS, Azure). It would be a good way to expand Nextflow's reach into China. I'm also not sure whether (or to what extent) users in China can access American cloud providers.

pditommaso commented 1 year ago

The community should step in and take care of that.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jing-memverge commented 1 year ago

This is worth a discussion with our team. MemVerge has genomics customers in China and we can already run Nextflow with Memory Machine Cloud. How can we better understand the requirements and the end users who are asking for this?

pditommaso commented 1 year ago

This would be very useful. We made some exploration in the past and Alibaba model was very similar to Azue Batch. However, we decided we could not support directly all clouds provided.

If you are keen to take ownership of this contribution, we would be very happy to advise you on this implementation.

Regarding the requirement I guess are the same for average nextflow users. Deploy data pipelines in a scalable manner.

jing-memverge commented 1 year ago

Once our plugin is contributed to the Nextflow community per completing this https://github.com/nextflow-io/plugins/issues/12 then any user can connect their Nextflow to MMC as the batch computing environment for the China clouds we support. Currently it is AliCloud but we may expand support over time. The plugin doesn't provide for a cloud native batch integration but MMC will be free to use forever and provides the same or better capabilities than the current cloud native batch services. The more advanced MMC capabilities will require a MMC Pro or Enterprise commercial license.

pditommaso commented 1 year ago

Indeed, I was thinking the same. However, it remains the non-trivial problem of accessing the object storage provided by cloud provider.

Does Alibaba cloud has a S3-like (I mean API-compatible) storage layer?

denny-zhao commented 1 year ago

Alibaba' object storage is called OSS (Object Storage Service) . It supports the similar feature set that AWS S3 has. There are several differences that might introduce the major extra effort (potentially):

  1. For security concerns, OSS supports virtual-host-style requests only, while S3 support both path-style and virtual-host style.
  2. ACL definitions are slightly different. Besides, there are slight differences in parameters for individual API. That could be some trivial effort to convert from S3 to OSS API, but not much.
pditommaso commented 1 year ago

I believe so. I was wondering if by any chance it's compatible with the S3 API to avoid the need to implement a new client on nextflow side for Alibaba storage