ncbi / magicblast

34 stars 16 forks source link

Use magicblast to map an experiment in the SRA reporitory #50

Open sabina-llr opened 1 year ago

sabina-llr commented 1 year ago

Hi,

I have used magicblast version 1.4.0 and 1.5.0 in the past to map a query to experiments in the SRA repository. I have now installed the latestmagicblast version and ran a script (that gave me results a few month ago) to make sure that everything works as it is supposed to be. However, when running this:

makeblastdb -in reference.fa -out reference -parse_seqids -dbtype nucl magicblast -sra SRR8732225 -db reference -out SRR8732225_aligned.sam -num_threads 2 -outfmt sam -no_unaligned

I get the following error message: VDB: 2022-09-09T09:01:05 . sys: mbedtls_ssl_get_verify_result returned 0x4008 ( !! The certificate is not correctly signed by the trusted CA !! The certificate is signed with an unacceptable hash. ) BLAST query/options error: The provided SRA accession 'SRR8732225' does not exist Please refer to the BLAST+ user manual.

I have tried using several SRA experiments but I get the same error for all of them... any idea what the problem could be? Thanks for any helps!

tom6931 commented 1 year ago

Hello, I tried your run at the NCBI and did not see an issue. I will try it from a machine external to the NCBI next week to see if I can reproduce it.

The work-around for now would be to dump out the FASTA using the SRA toolkit (see https://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit) and start your magic-blast run from there. I'm sorry you ran into trouble.

Tom

boratyng commented 1 year ago

@sabina-llr , could you provide us with more information on how you are running Magic-BLAST:

  1. In what system are you running Magic-BLAST?
  2. How did you install Magic-BLAST (tar.gz or zip file from NCBI FTP site, Bioconda, other way)
  3. Are you using docker or other container service?

Thanks, Greg

sabina-llr commented 1 year ago

Hi both,

thanks for your responses. I am running Magic-BLAST on a High Performance Computing Cluster available at the university where I work. I installed Magic-BLAST as a Conda environment (https://anaconda.org/bioconda/magicblast) but I have also tested a version installed as a singularity container for running Galaxy jobs (singularity exec /cvmfs/singularity.galaxyproject.org/m/a/magicblast:1.5.0--h2d02072_0 magicblast -sra SRR8732225 -db reference -out SRR8732225_aligned.sam -num_threads 2 -outfmt sam -no_unaligned).. same error in both cases.

Thanks Sabina

boratyng commented 1 year ago

Hi @sabina-llr ,

Thanks for the update. I suspect that there may be something wrong with Magic-BLAST build in Bioconda. Can you try using Magic-BLAST binary from NCBI FTP site: https://ftp.ncbi.nlm.nih.gov/blast/executables/magicblast/LATEST/ncbi-magicblast-1.6.0-x64-linux.tar.gz instead of the Bioconda installation?

Thanks, Greg

sabina-llr commented 1 year ago

Hi @boratyng,

The IT dept at the university where I work has suggested that the issue is caused by the fact that magicblast no longer uses ENA sites to get the SRA data, but Google/Amazon clouds instead. Unfortunately we have restricted access to the internet from our HPC. It would help if you could tell me where magicblast is trying to get the SRA files on the internet, so that I can check if these sites can be opened on our HPC.

Thanks, Sabina

boratyng commented 1 year ago

Hi @sabina-llr,

Yes, magicblast 1.6.0 downloads SRA data from the cloud, mostly AWS. Here is more information on this: https://ncbi.github.io/magicblast/cook/cloud-sra.html.

To download an SRA run Magic-BLAST typically contacts https://locate.ncbi.nlm.nih.gov/sdl/2/retrieve to retrieve a URL for an accession and an accession specific URL, for example https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR5189652/SRR5189652 for SRR5189652. In most cases it will start with https://sra-pub-run-odp.s3.amazonaws.com. You can find out the accession specific URL for any SRA accession by running srapath tool from the NCBI SRA Toolkit (https://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit), for example:

$ srapath SRR5189652
https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR5189652/SRR5189652

srapath should be run on a host that has no network restrictions.

boratyng commented 1 year ago

Hi @sabina-llr, did the above information about URLs that Magic-BLAST contacts help?

sabina-llr commented 1 year ago

Thanks for following up on this @boratyng. The IT department has confirmed that our HPC is already open to download from s3.amazonaws.com. We have installed the binary version in our module environment, but unfortunately also this version gave no results and all the .sam file looks like this: @HD VN:1.0 GO:query @SQ SN:Bacteroides_salyersiae_XGOsPUL LN:31240 @SQ SN:NZ_KB905466_GH3_through_GH5 LN:21725 @PG ID:magicblast PN:magicblast CL:magicblast -sra SRR5763462 -db reference -out SRR5763462_aligned.sam -num_threads 2 -outfmt sam -no_unaligned

No idea at the moment on how this issue could be solved..

boratyng commented 1 year ago

Thank you for the update and I am sorry you still have a problem. Are you still getting error messages with the new binary?

We will keep digging and in the mean time you can try downloading your reads before running Magic-BLAST as a workaround:

  1. You can use fastq-dump tool from SRA Tookit (https://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit) as Tom suggested earlier and use a FASTA or FASTQ file with Magic-BLAST.

  2. Or you can download the SRA file directly from AWS, for example with wget or curl:

    curl -O https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR8732225/SRR8732225

    will download the file SRR8732225 in SRA format and you can use Magic-BLAST's -sra option to read it:

    magicblast -sra ./SRR8732225 -db reference
sabina-llr commented 1 year ago

Hi @boratyng I dont get error messages with the new binary but the sam file has no reads and looks just like this: @HD VN:1.0 GO:query @SQ SN:Bacteroides_salyersiae_XGOsPUL LN:31240 @SQ SN:NZ_KB905466_GH3_through_GH5 LN:21725 @PG ID:magicblast PN:magicblast CL:magicblast -sra SRR5763445 -db reference -out SRR5763445_aligned.sam -num_threads 2 -outfmt sam -no_unaligned

Unfortunately downloading the reads before running Magic_BLAST is not a feasible solution as I am trying to look for a specific locus in about 3000 samples from metagenomes (and I have successfully ran this type of analysis before)... this would require a massive amount of space.

Any help is highly appreciated. Cheers, Sabina

sabina-llr commented 1 year ago

Hi @boratyng As an update, I have downloaded one SRA file and tried to run magicblast with the attached reference file reference.zip ... still no results.

the commands I am using are:

curl -O https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR15595244/SRR15595244

then

makeblastdb -in reference.fa -out reference -parse_seqids -dbtype nucl

then

magicblast -sra $path0/SRR15595244 -db reference -out SRR15595244_aligned.sam -num_threads 2 -outfmt sam -no_unaligned

could you please try to replicate the run and let me know if that works for you? that would be highly appreciated! Thank again

boratyng commented 1 year ago

Hi Sabina,

I re-ran your search and am also not getting any alignments. It looks like none of the the reads in SRR15595244 aligns to your reference. In your earlier post you said that you had not seen any error messages but had been getting empty results. It looks like Magic-BLAST may have been working you and downloading SRA sequences, but was not finding any matches. You can verify it by replacing -no_unaligned option with -out_unaligned unaligned.sam. The unaligned.sam file will have all reads that did not align to your reference. If you see sequences there, it means that Magic-BLAST downloaded them.

Did you expect to find more alignments?

Thanks, Greg