ncbi / elastic-blast

ElasticBLAST is a cloud-based tool to perform your BLAST searches faster and make you more effective
https://blast.ncbi.nlm.nih.gov/doc/elastic-blast
Other
46 stars 15 forks source link

Error running blastx on gcp #10

Closed pipaber closed 2 years ago

pipaber commented 2 years ago

Hello!

I´m having trouble running elastic-blast on gcp framework. I wanna use blastx to blast the transcripts of Coffea arabica ( GCA_003713225.1 Cara_1.0) over the nr database.

The transcripts consists of 80,667 sequences. I also active the option of payment at the end of the month but i still cant run my query.

My settings to use the cluster:

[cloud-provider] gcp-project = axial-totality-344915 gcp-region = us-east4 gcp-zone = us-east4-b

[cluster] num-nodes = 4 pd-size = 200G labels = owner=p_palacios_bernuy

[blast] program = blastx db = nr batch-len = 500000 queries = /home/p_palacios_bernuy/GCF_003713225.1_Cara_1.0_rna.fna.gz results = gs://elasticblast-p_palacios_bernuy/results/Coffee options = -task Blastx-fast -evalue 0.05 -outfmt 5

And i receive this error:

ERROR: The command "gcloud container clusters create elasticblast-p-palacios-bernuy-39b27e051 --no-enable-autoupgrade --project axial-totality-344915 --zone us-east4-b --machine-type n1-highmem-32 --num-nodes 1 --scopes compute-rw,storage-rw,cloud-platform,logging-write,monitoring-write --labels cluster-name=elasticblast-p-palacios-bernuy-39b27e051,client-hostname=cs-38659700378-default,project=elastic-blast,billingcode=elastic-blast,creator=p_palacios_bernuy,created=2022-03-23-16-44-22,owner=p_palacios_bernuy,program=blastx,db=nr,name=elasticblast-p-palacios-bernuy-39b27e051,results=gs---elasticblast-p_palacios_bernuy-results-coffee,version=0-2-4" returned with exit code 1 Default change: VPC-native is the default mode during cluster creation for versions greater than 1.21.0-gke.1500. To create advanced routes based clusters, please pass the --no-enable-ip-alias flag Note: Your Pod address range (--cluster-ipv4-cidr) can accommodate at most 1008 node(s). ERROR: (gcloud.container.clusters.create) ResponseError: code=403, message=Insufficient regional quota to satisfy request: resource "CPUS": request requires '32.0' and is short '8.0'. project has a quota of '24.0' with '24.0' available. View and manage quotas at https://console.cloud.google.com/iam-admin/quotas?usage=USED&project=axial-totality-344915.

ERROR: cleanup stage failed: kubernetes context is missing for elasticblast-p-palacios-bernuy-39b27e051 ERROR: cleanup stage failed: Cluster elasticblast-p-palacios-bernuy-39b27e051 was not found ERROR: kubernetes context is missing for elasticblast-p-palacios-bernuy-39b27e051 ERROR: Cluster elasticblast-p-palacios-bernuy-39b27e051 was not found

boratyng commented 2 years ago

Hi @PieroPaBE,

It looks like your regional quota allows for only 24 CPUs. Elastic-BLAST needs to use an instance type with enough memory to accommodate nr database and these have at least 32 CPUs. Here are instructions on how to increase this quota: https://cloud.google.com/docs/quota#requesting_higher_quota. Please, give it try and let us know if it helped. Remember that you will need a multiple of 32 to be able to use multiple nodes.

I see that you set pd-size and batch-len parameters in your config file. Is there a reason for this? Elastic-BLAST sets these parameters automatically. If you ran into persistent disk size quota it can also be increased. Elastic-BLAST optimizes batch-len for the best performance.

pipaber commented 2 years ago

Hi @boratyng

Thank you for the suggestion. Google answered fast (less than a minute) to my query of extra quota. When I tried to run after the extra quota elastic blast gave me this error:

ERROR: Your ElasticBLAST search has failed and its computing resources will be deleted. The batch size specified (30000) led to creating 5378 kubernetes jobs, which exceeds the limit on number of jobs (5000). Please increase the batch-len parameter to at least 30780 and repeat the search.

So I specify the batch-len to 35000 and the software ran without a problem.

boratyng commented 2 years ago

Great! Thank you for letting us know. Kubernetes in GCP has a limit of 5000 jobs. With batch-len selected by Elastic-BLAST your search needed more, so the only way to fix this problem was to increase batch-len.

boratyng commented 2 years ago

Hi @PieroPaBE , has your Elastic-BLAST search completed successfully?

pipaber commented 2 years ago

Hi @boratyng, unfortunately the search was not completed. I left it running for two days and of the 4000 batches generated when i set batch-len=35000, the software had completed 145 batches. Of the 300 dollars that GCP give you, elastic-blast had spent 200 dollars in those two days, so I had to cancel it because the possibly total amount until complete the job was between 2000-2300 dollars. It is a very useful tool, however as I am doing a thesis that amount is very high for my project budget.

boratyng commented 2 years ago

Thanks for the update. I am sorry to hear that. You do have a large search. Elastic-BLAST can use preemptible instances which are much cheaper: https://cloud.google.com/compute/docs/instances/preemptible. To use them you need to add this to your Elastic-BLAST config file:

[cluster]
use-preemptible = yes

This search would probably still be more expensive than $300 with preemptible instances, though.