ncbi / sra-tools

SRA Tools
Other
1.12k stars 246 forks source link

Fasterq-dump pathing issues #810

Closed merrick893 closed 1 year ago

merrick893 commented 1 year ago

I have been working through using the SRA toolkit the past couple days. Me and my lab are hoping on downloading some 10x genomics single cell RNA-seq data to analyze but ran into issues downloading the fastq files directly by using the SRR accession number. Instead we were able to download the SRA file and are now trying to use the fasterq-dump command directly on the local SRA file. Every time I run the following code I run into the following error:

image

In this image I am in the bin directory of the SRA toolkit download and am trying to run the fasterq-dump command on a file in a different directory. I verified this is the correct path to the SRA file. The same error occurs even if I use the explicit SRA file path, changing the current directory, or using fastq-dump instead with the proper modifiers. For this example I turned off remote access to ensure the program used the local file. I am working on a MS Windows OS.

wraetz commented 1 year ago

We will investigate this issue.

In the meantime install WSL on Windows and pick an Ubuntu- or Debian- distro. In there install the linux-version of our SRA-Toolkit. Then try again, but this time on "Linux".

Another problem I can see is this: you think that for instance SRR1234567.sra is the sra-accession. That is wrong! An accession is a directory with files in it. Do not take files out of this directory. All files in there are needed. Give the path of the directory ( not the file ) to the tool. If you give just the path to the .sra file - it might work - but only for some accessions. wrong: "fasterq-dump ....\NCBI\sra\SRR12024267.sra" right: "fasterq-dump ....\NCBI\SRR12024267" ( but you already took the .sra file out of it's directory - you have to 'prefetch' the accessions again. ) --- this is not the reason why you are getting the error, but it will be your next problem ---

merrick893 commented 1 year ago

Thank you for this insight! I will try prefetch on the accession and then fasterq-dump. If that doesn't work I'll try the linux VM. I'll be back with results.

wraetz commented 1 year ago

Another problem here: the object you try to convert to FASTQ is in ".. \ .. \ NCBI_1 \ sra" the tool tries to 'cd' into this directory - and fails - it does not exist. Are you sure that the object is in C:\Users\schol\Documents\BrownLab\SRAToolkit\NCBI_1\sra' ? You are going 2 directories back in the path. Maybe you need more '..' in your relative path.

merrick893 commented 1 year ago

Thanks for the response; however I am certain this is the correct directory. I couldn't find a convenient way to take a screenshot of this but this is the best I could do:

image

However, if as you said before this function is supposed to work on a directory of files instead of just a single file then it may be trying to cd into the SRR12024267.sra file which is not a directory.

wraetz commented 1 year ago

No the error you are seeing comes from the path you are giving being not valid. Try to give an absolute path instead of the relative one. Before doing this try it with "dir C:\the\path\to\the\file" in CMD.exe - Check if dir can resolve it. The fact that it has to be a directory containing files comes later. We have to get over the error you are seeing first! In case of your accession they are important - because your accession is compressed against references. The files next to the .sra file are the references. They are needed.

merrick893 commented 1 year ago

Here is the result of the dir resolution and followup fasterq-dump:

image

It seems that the dir was able to find the directory but fasterq-dump was having issues. I copy and pasted the directory in the fasterq-dump command to ensure I wasn't spelling something incorrectly or something like that.

wraetz commented 1 year ago

Yes I see that there is a problem. We have to investigate that. Thank you for helping with the investigation. Here is something you can try before going the WSL-Linux route: In the bin-directory there is a file called fasterq-dump-orig.exe. Use that one instead of fasterq-dump.exe

merrick893 commented 1 year ago

This solution looks like it is working!

The command still resulted in an error but by looking at the description it looks like it doesn't have required files in the same directory like you mentioned before. I will prefetch and ensure the correct dependencies are in the same directory as the sra file.

Thank you for your help in solving this issue!