ncbi / elastic-blast

ElasticBLAST is a cloud-based tool to perform your BLAST searches faster and make you more effective
https://blast.ncbi.nlm.nih.gov/doc/elastic-blast
Other
44 stars 14 forks source link

ERROR:root:Resource job-cloud-split-local-ssd.yaml.template is missing from the package #15

Open arminmm91 opened 1 year ago

arminmm91 commented 1 year ago

Hi,

I've installed the elastic-blast package and all it's dependences, but when I want to check the version as a test for its executability, it returns this error:

python ..elb-venv\Scripts\elastic-blast --version ERROR:root:Resource job-cloud-split-local-ssd.yaml.template is missing from the package. Please re-install ElasticBLAST

Also, remove and installed the package a couple of times, did not work. Please let me possible solutions or opinions.

boratyng commented 1 year ago

Hi @arminmm91, thank you for trying elastic-blast and we are sorry you ran into a problem. Are you by any chance working on Windows? If so, I'm sorry to report that ElasticBLAST isn't supported on Windows at this time. We are looking into your problem, but cannot promise a quick solution. In the mean time you can try it using the AWS or GCP cloud shell from a Windows machine and run elastic-blast there. I hope this helps.

arminmm91 commented 1 year ago

thanks for your response @boratyng, I was trying to run it on PowerShell, but I tried WSL, still did not work.

boratyng commented 1 year ago

We looked closer into your issue and it looks like elastic-blast does not install correctly in Windows. A few files are missing and this results in the error message that you saw. There are additional small issues, so at this point elastic-blast will not run in Windows. Sorry.

Unfortunately I am not very familiar with WSL, but here are a few suggestions:

It is relatively easy to run elastic-blast in cloudshell either in AWS or GCP, and you can access cloudshell from Windows. Please, consider running elastic-blast in cloudshell. Here is some information about it:

arminmm91 commented 1 year ago

Thank you for the suggesting cloud shell. Although everything should be working, but I submitted a job, 15,000 sequences with this configuration(the actual path of bucket was provided when submitted): [cloud-provider] gcp-region = us-west3 gcp-zone = us-west3-c

[cluster] name = elastic-blast num-nodes = 1 labels = owner=user pd-size = 200G

[blast] program = blastx db = swissprot queries = gs://XXXX1/Ad_orf.cds results = gs://XXXproject1/ options = -task blastx-fast -evalue 1e-3 -outfmt "6 qseqid sacc pident length gapopen qstart qend sstart send evalue bitscore" -entrez_query "Viridiplantae [Organism]"

and when I check gcloud container clusters list, a job is still running, however, kubectl get pods says there two that both completed one with error the other completed with 0 error, when I check the results, there is none.

Please let know if you see any mistakes on my side.

boratyng commented 1 year ago

Unfortunately -entrez_query is not supported by elastic-blast. You can use -taxids option instead, so your options parameter will look like this:

options = -task blastx-fast -evalue 1e-3 -outfmt "6 qseqid sacc pident length gapopen qstart qend sstart send evalue bitscore" -taxids 33090

If you are still running into trouble it would be helpful for us if you attached elastic-blast.log, error messages and the information that we ask here: https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/support.html.

boratyng commented 1 year ago

@arminmm91 , we noticed that the default Kubernetes version was updated in GCP and it may be causing issues for elastic-blast. If you are still having issues adding gke-version = 1.24 under cloud-provider may help:

[cloud-provider]
gcp-region = us-west3
gcp-zone = us-west3-c
gke-version = 1.24
arminmm91 commented 1 year ago

Hi @boratyng, I managed to work with swissprot database through google cloud and shell. However, for nt and nr databases it gave this error: Your ElasticBLAST search failed, please help us improve ElasticBLAST by reporting this failure as described in https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/support.html I adjusted CPU and RAM but still failed.

boratyng commented 1 year ago

Hi @arminmm91, can you attach elastic-blast.log file and elastic-blast-diagnotics.tgz created by running:

gsutil -qm cp -r ${YOUR_RESULTS_BUCKET}/logs .
gsutil -qm cp -r ${YOUR_RESULTS_BUCKET}/metadata .
tar czf elastic-blast-diagnostics.tgz logs metadata

Please, also attach or post result of this command:

gsutil ls -lr ${YOUR_RESULTS_BUCKET}

Thanks.

arminmm91 commented 1 year ago

Hi @boratyng, please see attached all the log files and metadata (only the important ones), I replaced the actual bucket name and project with 'xxx':

metadata_elastic-blast-config.txt logs_k8s-init-pv-lmql5-get-blastdb.txt logs_k8s-submit-submit-jobs.txt logs_k8s-init-pv-lmql5-import-query-batches.log metadata_FAILURE_details (1).txt metadata_FAILURE (1).txt

This command, $ tar czf elastic-blast-diagnostics.tgz logs metadata returned this: tar: logs: Cannot stat: No such file or directory tar: metadata: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors

Best,

boratyng commented 1 year ago

@arminmm91, I found one problem. It looks like you are trying to run a blastn search with -task blastn-fast. There is no blastn-fast task. The fast nucleotide search is megablast. Please use -task megablast. It is also the default task, so if you do not specify the -task parameter, megablast will be used.

arminmm91 commented 1 year ago

Hi @boratyng, thanks again but the change did not lead to a successful run. I've attached the log file:

metadata_FAILURE_details (2).txt

name = blastn db = nt queries = gs://xxx/all_nucl.cds results = gs://xxxx options = -task -evalue 1e-3 -outfmt "6 qseqid sacc pident length gapopen qstart qend sstart send evalue bitscore" -taxids 33090

boratyng commented 1 year ago

Hi @arminmm91, you left the -task parameter without the argument in options. You need to either specify -task megablast or remove -task from options. Sorry if I was not clear about this earlier.

arminmm91 commented 1 year ago

Hi @boratyng, I tried -task megablast as well, forgot to mention, but still got the same error.

boratyng commented 1 year ago

@arminmm91, metadata_FAILURE_details(1).txt and metadata_FAILURE_details(2).txt are for the runs where you specified -task blasts-fast and -task (no task name). These have incorrect blast options. I do not see the logs for the search where you specified -task megablast can you attach it?

Also, I see that you set pd-size = 300G. This is very close to what nt needs and it may also be causing issues. Do you have issues with GCP quota? Are you able run elastic-blast without specifying the pd-size parameter? If you need to specify it, please set it to at least 350G.