ncbi / magicblast

34 stars 16 forks source link

Use NCBI SRA repository ( NCBI Magic-BLAST RNA-seq mapping tool) #49

Open joseailtoncruz opened 2 years ago

joseailtoncruz commented 2 years ago

Hello how are you!

I'm trying to use the NCBI Magic-BLAST tool RNA-seq mapping tool, but I'm not getting it.

I followed the tutorial available on NCBI (https://ncbi.github.io/magicblast/cook/sra.html). Firstly I downloaded the package and installed it on my machine, then I created a database with my reference sequences (ie my refseq, several sequences, totaling 9,500).

magicblast -query SRR1237994.fas.gz -db my_reference

Soon after I tried to use the NCBI SRA repository, and the following error appeared: @PG ID:magicblast PN:magicblast CL: BLAST query/options error: FASTA parse error: defline expected Please refer to the BLAST+ user manual

I also tried to use it locally, downloading SRR..... and then running it via terminal with the following command: magicblast -query reads.fa -db my_reference. And again I get the same error.

I've tried several commands and so far I haven't been able to get any result. Could someone help me??

Thank you in advance, and I apologize for my English.

boratyng commented 2 years ago

Hi @joseailtoncruz,

Thank you for trying out Magic-BLAST and I am sorry you ran into problems. How did you download the SRR1237994.fas.gz file? I suspect that it is in FASTQ format and by default Magic-BLAST expects FASTA. You can change input format to FASTQ with -infmt FASTQ option, so your command line should look like this:

magicblast -query SRR1237994.fas.gz -db my_reference -infmt fastq

You can also use SRA accession and Magic-BLAST will download the reads from SRA and align them:

magicblast -sra SRR1237994 -db my_reference

So you do not have to pre-download SRA runs.

joseailtoncruz commented 2 years ago

Em seg., 29 de ago. de 2022 às 18:56, Greg Boratyn @.***> escreveu:

Hi @joseailtoncruz https://github.com/joseailtoncruz,

Thank you for trying out Magic-BLAST and I am sorry you ran into problems. How did you download the SRR1237994.fas.gz file? I suspect that it is in FASTQ format and by default Magic-BLAST expects FASTA. You can change input format to FASTQ with -infmt FASTQ option, so your command line should look like this:

magicblast -query SRR1237994.fas.gz -db my_reference -infmt fastq

You can also use SRA accession and Magic-BLAST will download the reads from SRA and align them: `` magicblast -sra SRR1237994 -db my_reference

So you do not have to pre-download SRA runs.

— Reply to this email directly, view it on GitHub https://github.com/ncbi/magicblast/issues/49#issuecomment-1230904618, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2ZYYGXGRZQMMUEODHXU5YDV3UW2LANCNFSM5765YLOA . You are receiving this because you were mentioned.Message ID: @.***>

Good evening! I tried to perform this magicblast command -sra SRR1237994 -db my_reference, with this and other SRRs available in the NCBI, but unfortunately I always got the same error, it always warns that the SRRs are not available in the NCBI, that is that the SRRs does not exist --

Atenciosamente,

José Ailton Cruz Macêdo dos Santos Estudante de Pós-graduação em fitopatologia Universidade Federal Rural de Pernambuco Departamento de Agronomia UFRPE Pernambuco-PE Fone: (81) 986837383 OI E-mail: @.** Currículo Lattes: http://lattes.cnpq.br/1553284695502900* https://wwws.cnpq.br/cvlattesweb/PKG_MENU.menu?f_cod=CF0D9F6E8F62F07857E12471E4AA7397#

boratyng commented 2 years ago

@joseailtoncruz, could you post the exact error message for the run?:

magicblast -sra SRR1237994 -db my_reference

Please, also make sure you are using the latest version of Magic-BLAST: 1.6.0. You can find this out with the -version command line option:

magicblast -version
magicblast: 1.6.0
 Package: magicblast 1.6.0, build May  4 2021 17:13:27

The older versions may not work with some SRA accessions.

Did adding -infmt fastq option work for your downloaded file?

joseailtoncruz commented 2 years ago

Hello, how are you! I'm forwarding the command lines I've used so far, and the results they've returned to me.

$ magicblast -version magicblast: 1.6.0 Package: magicblast 1.6.0, build Feb 23 2022 11:38:02

$ makeblastdb -in reference.fasta -dbtype nucl -parse_seqids -out my_reference -title "Database title" Building a new DB, current time: 08/30/2022 16:10:15 New DB name: /home/ailton_ubuntu/miniconda3/bin/my_reference New DB title: Database title Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 9843 sequences in 0.767447 seconds.

$ magicblast -sra SRR10394962 -db my_reference @PG ID:magicblast PN:magicblast CL:magicblast -sra SRR10394962 -db my_reference BLAST query/options error: The provided SRA accession 'SRR10394962' does not exist Please refer to the BLAST+ user manual.

Here is the link to the NCBI website where I can find the SRA: https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR10394962&display=download

$ magicblast -query SRR1237994.fas.gz -db my_reference -infmt fastq

The command line you recommended (magic blast -query SRR1237994.fas.gz -db my_reference -infmt fastq) applies it, and I managed to run the program, but it's taking a while to return a result.

But I believe it would be faster and more profitable to run Magic-blast, not the need to download all the SRAs I need.

Atenciosamente,

José Ailton Cruz Macêdo dos Santos Estudante de Pós-graduação em fitopatologia Universidade Federal Rural de Pernambuco Departamento de Agronomia UFRPE Pernambuco-PE Fone: (81) 986837383 OI E-mail: @.** Currículo Lattes: http://lattes.cnpq.br/1553284695502900* https://wwws.cnpq.br/cvlattesweb/PKG_MENU.menu?f_cod=CF0D9F6E8F62F07857E12471E4AA7397#

Em ter., 30 de ago. de 2022 às 12:40, Greg Boratyn @.***> escreveu:

@joseailtoncruz https://github.com/joseailtoncruz, could you post the exact error message for the run?:

magicblast -sra SRR1237994 -db my_reference

Please, make sure you are using the latest version: 1.6.0. You can find this out with the -version command line option:

magicblast -version magicblast: 1.6.0 Package: magicblast 1.6.0, build May 4 2021 17:13:27

The older versions may not work with some SRA accessions.

Did adding -infmt fastq option work for your downloaded file?

— Reply to this email directly, view it on GitHub https://github.com/ncbi/magicblast/issues/49#issuecomment-1231840694, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2ZYYGQ3NKMSTVUOO72BT3TV3YTPLANCNFSM5765YLOA . You are receiving this because you were mentioned.Message ID: @.***>

joseailtoncruz commented 2 years ago

So I'm not able to run magicblast by the command: $ magicblast -sra SRR10394962 -db my_reference and $ magicblast -query SRR1237994.fas.gz -db my_reference -infmt fastq, both commands are giving errors.

Once again I apologize for my English, and I hope you help me to resolve these errors. Because the magicblast would be really very useful in my analysis

Atenciosamente,

José Ailton Cruz Macêdo dos Santos Estudante de Pós-graduação em fitopatologia Universidade Federal Rural de Pernambuco Departamento de Agronomia UFRPE Pernambuco-PE Fone: (81) 986837383 OI E-mail: @.** Currículo Lattes: http://lattes.cnpq.br/1553284695502900* https://wwws.cnpq.br/cvlattesweb/PKG_MENU.menu?f_cod=CF0D9F6E8F62F07857E12471E4AA7397#

Em ter., 30 de ago. de 2022 às 16:35, ailton cruz @.***> escreveu:

Hello, how are you! I'm forwarding the command lines I've used so far, and the results they've returned to me.

$ magicblast -version magicblast: 1.6.0 Package: magicblast 1.6.0, build Feb 23 2022 11:38:02

$ makeblastdb -in reference.fasta -dbtype nucl -parse_seqids -out my_reference -title "Database title" Building a new DB, current time: 08/30/2022 16:10:15 New DB name: /home/ailton_ubuntu/miniconda3/bin/my_reference New DB title: Database title Sequence type: Nucleotide Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 9843 sequences in 0.767447 seconds.

$ magicblast -sra SRR10394962 -db my_reference @PG ID:magicblast PN:magicblast CL:magicblast -sra SRR10394962 -db my_reference BLAST query/options error: The provided SRA accession 'SRR10394962' does not exist Please refer to the BLAST+ user manual.

Here is the link to the NCBI website where I can find the SRA: https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR10394962&display=download

$ magicblast -query SRR1237994.fas.gz -db my_reference -infmt fastq

The command line you recommended (magic blast -query SRR1237994.fas.gz -db my_reference -infmt fastq) applies it, and I managed to run the program, but it's taking a while to return a result.

But I believe it would be faster and more profitable to run Magic-blast, not the need to download all the SRAs I need.

Atenciosamente,

José Ailton Cruz Macêdo dos Santos Estudante de Pós-graduação em fitopatologia Universidade Federal Rural de Pernambuco Departamento de Agronomia UFRPE Pernambuco-PE Fone: (81) 986837383 OI E-mail: @.** Currículo Lattes: http://lattes.cnpq.br/1553284695502900* https://wwws.cnpq.br/cvlattesweb/PKG_MENU.menu?f_cod=CF0D9F6E8F62F07857E12471E4AA7397#

Em ter., 30 de ago. de 2022 às 12:40, Greg Boratyn < @.***> escreveu:

@joseailtoncruz https://github.com/joseailtoncruz, could you post the exact error message for the run?:

magicblast -sra SRR1237994 -db my_reference

Please, make sure you are using the latest version: 1.6.0. You can find this out with the -version command line option:

magicblast -version magicblast: 1.6.0 Package: magicblast 1.6.0, build May 4 2021 17:13:27

The older versions may not work with some SRA accessions.

Did adding -infmt fastq option work for your downloaded file?

— Reply to this email directly, view it on GitHub https://github.com/ncbi/magicblast/issues/49#issuecomment-1231840694, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2ZYYGQ3NKMSTVUOO72BT3TV3YTPLANCNFSM5765YLOA . You are receiving this because you were mentioned.Message ID: @.***>

boratyng commented 2 years ago

@joseailtoncruz , thank you for all the information. I am looking into possible fixes, but need to ask you a few more questions:

  1. What is the error message for this command?:

    magicblast -query SRR1237994.fas.gz -db my_reference -infmt fastq
  2. What is the operating system where you run Magic-BLAST (Linux/MacOS/Windows)?

  3. How did you download the file SRR1237994.fas.gz? On the web page https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR10394962&display=download you can either download a FASTA file SRR1237994.fasta.gz or FASTQ file SRR1237994.fastq.gz. Did you click on "FASTA" or "FASTQ" buttons/tabs or somewhere else?

joseailtoncruz commented 2 years ago

Good night! Sorry because I didn't answer before. Follow the command and error that seem

$ magicblast -query SRR1237994.fas.gz -db my_reference -infmt fastq

When running the above command got the following error:

Error: (CInputException::eInvalidInput) FASTQ parse error: defline expected at line: 112050225 Error: (117.2) CThread::Wrapper: CThread::Main() failed (CInputException::eInvalidInput) FASTQ parse error: defline expected at line: 112050225

Em ter., 30 de ago. de 2022 às 18:43, Greg Boratyn @.***> escreveu:

@joseailtoncruz https://github.com/joseailtoncruz , thank you for all the information. I am looking into possible fixes, but need to ask you a few more questions:

  1. What is the error message for this command?:

    magicblast -query SRR1237994.fas.gz -db my_reference -infmt fastq

  2. In what system are you running Magic-BLAST?

— Reply to this email directly, view it on GitHub https://github.com/ncbi/magicblast/issues/49#issuecomment-1232200080, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2ZYYGRLV5WFJEY7VDIPR63V3Z6AHANCNFSM5765YLOA . You are receiving this because you were mentioned.Message ID: @.***>

--

Atenciosamente,

José Ailton Cruz Macêdo dos Santos Estudante de Pós-graduação em fitopatologia Universidade Federal Rural de Pernambuco Departamento de Agronomia UFRPE Pernambuco-PE Fone: (81) 986837383 OI E-mail: @.** Currículo Lattes: http://lattes.cnpq.br/1553284695502900* https://wwws.cnpq.br/cvlattesweb/PKG_MENU.menu?f_cod=CF0D9F6E8F62F07857E12471E4AA7397#

boratyng commented 2 years ago

Thank you for the update. I suspect that SRR1237994.fas.gz file was either not downloaded completely or corrupted during the download. Downloads of SRA runs from NCBI may be very slow at times. These runs are also available from cloud providers.

On the page that you used https://trace.ncbi.nlm.nih.gov/Traces/index.html?view=run_browser&acc=SRR10394962&display=download, click on the tab "Data access" image and then use link for AWS. This download should be faster and more reliable. It will download the file SRR10394962.man and this is how you can search/map it with Magic-BLAST:

magicblast -sra SRR10394962.man -db my_reference -out my_output.sam

Please, let me know if you need more help with this.

boratyng commented 2 years ago

Hi @joseailtoncruz,

Your problems with Magic-BLAST not downloading SRA runs may be due to a firewall. Are you running Magic-BLAST on an institutional (university or research center) host or an HPC cluster? If yes, you can check with your systems administrators to see if these URLs are blocked: https://locate.ncbi.nlm.nih.gov/sdl/2/retrieve and https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR1237994/SRR1237994.