ncbi / sra-tools

SRA Tools
Other
1.08k stars 243 forks source link

Configuration instruction are out of date #680

Closed AngCamp closed 2 years ago

AngCamp commented 2 years ago

The instillation instructions section on how to run the configuration is possibly out of date and confusing. I saw that it was last updated in October 2021 but version 3.0.0 came out in Feb. 2022 perhaps this is the reason. In any case there are vagaries in the instructions.

https://github.com/ncbi/sra-tools/wiki/03.-Quick-Toolkit-Configuration

https://github.com/ncbi/sra-tools/wiki/05.-Toolkit-Configuration

This is what I see when I run vdb-config -i as per the instructions here: https://github.com/ncbi/sra-tools/wiki/03.-Quick-Toolkit-Configuration image

AngCamp commented 2 years ago

When I run the command on step 6 from here https://github.com/ncbi/sra-tools/wiki/02.-Installing-SRA-Toolkit I get an error saying this command is not recognized. I have set the default path (in section 5 of the above option menu) for downloads and when I check the folder there is nothing there:

 (base) [acampbell@nelson ~]$ fastq-dump --stdout SRR390728 | head -n 8
@SRR390728.1 1 length=72
CATTCTTCACGTAGTTCTCGAGCCTTGGTTTTCAGCGATGGAGAATGACTTTGACAAGCTGAGAGAAGNTNC
+SRR390728.1 1 length=72
;;;;;;;;;;;;;;;;;;;;;;;;;;;9;;665142;;;;;;;;;;;;;;;;;;;;;;;;;;;;;96&&&&(
@SRR390728.2 2 length=72
AAGTAGGTCTCGTCTGTGTTTTCTACGAGCTTGTGTTCCAGCTGACCCACTCCCTGGGTGGGGGGACTGGGT
+SRR390728.2 2 length=72
;;;;;;;;;;;;;;;;;4;;;;3;393.1+4&&5&&;;;;;;;;;;;;;;;;;;;;;<9;<;;;;;464262
2022-07-12T23:50:24 fastq-dump.2.8.2 err: unknown while writing file within file system module - unknown system error 'Broken pipe(32)'
2022-07-12T23:50:24 fastq-dump.2.8.2 err: unknown while writing file within file system module - failed SRR390728
2022-07-12T23:50:24 fastq-dump.2.8.2 err: param invalid while writing file within file system module - Bad position for STDIO write 0 instead of 73728
AngCamp commented 2 years ago

When I run the test on step 4 of the instillation guide I get a problem as well

in my .bashrc file I have set the following as my path to the sra-toolkit: export PATH=$PATH:/home/acampbell/apps/sratoolkit.3.0.0-ubuntu64/bin

But when I run the test in part 4 I do not get that path, this may not be an issue but it is none the less confusing. And as noted above testing the pull does not work.

(base) [acampbell@nelson ~]$ which fastq-dump
/space/bin/fastq-dump

Lastly I would like to complain about the fact there is no instruction for how to set the binaries (I hope I am using that term correctly). In set 3 of the instillation guide you provide this line of code: export PATH=$PATH:$PWD/sratoolkit.2.4.0-1.mac64/bin but first of all this exact line will not work and you have not bothered explaining what this even does. You could simply state to run nano .bashrc or vim .bashrc and paste the path to the sra-toolkit folder there, again a picture of this having been done with a red arrow pointing to the line would give readers more confidence they are following along correctly. It would not add much length to the instillation instructions as clear up a lot of confusion. I figured out how to do this properly but again your instructions are confusing. Though my instillation is not working properly and fails to recognize the config command.

AngCamp commented 2 years ago

When I go to run prefetch and move into the directory I set in the configuration step the config command is not recognized. This makes little sense since vdb-config -i works as intended.

(base) [acampbell@nelson jeager_fastq_bam]$ $vdb-config --prefetch-to-cwd
bash: -config: command not found
klymenko commented 2 years ago

Remove dollar sign from your command. Run vdb-config --prefetch-to-cwd You execute $vdb-config --prefetch-to-cwd

AngCamp commented 2 years ago

I am using ubuntu. It is part of the linux interface one cannot remove the dollar sign.... its not something I typed.

AngCamp commented 2 years ago

None of this addresses the fact that your tutorial was clearly built for another version and is not even referring to stuff that exists in the current version.

klymenko commented 2 years ago
  1. Download the latest toolkit.

  2. Update PATH as: export PATH=/home/acampbell/apps/sratoolkit.3.0.0-ubuntu64/bin:$PATH

  3. bash: -config: command not found was printed as response to $vdb-config.

(base) [acampbell@nelson jeager_fastq_bam]$ $vdb-config --prefetch-to-cwd bash: -config: command not found

Note 2 dollar signs. Does your prompt ends with $ $?

AngCamp commented 2 years ago

Ah yes thanks did not realize I had the second $.

Also thanks for this: export PATH=$PATH:$PWD/sratoolkit.2.4.0-1.mac64/bin now I am getting the correct config menu. Why was I getting that other one (please refer to the picture I posted earlier)?

Also the test you provided on step 6 of the second section (https://github.com/ncbi/sra-tools/wiki/02.-Installing-SRA-Toolkit) fastq-dump --stdout SRR390728 | head -n 8 is not working, but fastq-dump SRR390728 does work.

klymenko commented 2 years ago

Could you run fastq-dump --stdout SRR390728 -X 3 ?

AngCamp commented 2 years ago
(base) [acampbell@nelson jeager_fastq_bam]$ fastq-dump --stdout SRR390728 -X 3
2022-07-13T18:57:03 fastq-dump.3.0.0 err: empty while validating file within network system module - error with https open 'https://storage.googleapis.com/sra-pub-zq-5/SRR390728/SRR390728.zq.vdbcache.1'
2022-07-13T18:57:07 fastq-dump.3.0.0 err: empty while validating file within network system module - error with https open 'https://storage.googleapis.com/sra-pub-zq-5/SRR390728/SRR390728.zq.vdbcache.1'
Read 3 spots for SRR390728
Written 3 spots for SRR390728
@SRR390728.1 1 length=72
CATTCTTCACGTAGTTCTCGAGCCTTGGTTTTCAGCGATGGAGAATGACTTTGACAAGCTGAGAGAAGNTNC
+SRR390728.1 1 length=72
;;;;;;;;;;;;;;;;;;;;;;;;;;;9;;665142;;;;;;;;;;;;;;;;;;;;;;;;;;;;;96&&&&(
@SRR390728.2 2 length=72
AAGTAGGTCTCGTCTGTGTTTTCTACGAGCTTGTGTTCCAGCTGACCCACTCCCTGGGTGGGGGGACTGGGT
+SRR390728.2 2 length=72
;;;;;;;;;;;;;;;;;4;;;;3;393.1+4&&5&&;;;;;;;;;;;;;;;;;;;;;<9;<;;;;;464262
@SRR390728.3 3 length=72
CCAGCCTGGCCAACAGAGTGTTACCCCGTTTTTACTTATTTATTATTATTATTTTGAGACAGAGCATTGGTC
+SRR390728.3 3 length=72
-;;;8;;;;;;;,*;;';-4,44;,:&,1,4'./&19;;;;;;669;;99;;;;;-;3;2;0;+;7442&2/
(base) [acampbell@nelson jeager_fastq_bam]$ 
AngCamp commented 2 years ago

Not sure if the errors are all that meaningful, they may just be confusing me as fastq-dump is actually working properly.

AngCamp commented 2 years ago

Another complication is that I am running this on my lab server, I think an older version of sra tools is installed for some of our pipelines and there may be some settings in the environment that are interfering with certain options. In any case I am able to pull fastq from sra which is what matters. So I appreciate your help, I think the main issue was the PATH in my .bashrc file.

Note I had to run nano ~/.bashrcnot nano .bashrc I still think adding a small section to your tutorial explaining what you are doing here would help. Not all users are going to be experienced with linux and bash. I am a biologists in my first year of a bioinformatics masters program all this is new to me.

klymenko commented 2 years ago

What is the output of curl https://locate.ncbi.nlm.nih.gov/sdl/2/retrieve?acc=SRR390728 ?

AngCamp commented 2 years ago
(base) [acampbell@nelson ~]$ curl https://locate.ncbi.nlm.nih.gov/sdl/2/retrieve?acc=SRR390728
{
    "version": "2",
    "result": [
        {
            "bundle": "SRR390728",
            "status": 200,
            "msg": "ok",
            "files": [
                {
                    "object": "srapub|SRR390728",
                    "accession": "SRR390728",
                    "type": "sra",
                    "name": "SRR390728",
                    "size": 195044834,
                    "md5": "2112d9b68adc1190147cfbe3cd05f0e4",
                    "modificationDate": "2019-06-29T06:03:26Z",
                    "locations": [
                        {
                            "service": "s3",
                            "region": "us-east-1",
                            "link": "https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR390728/SRR390728"
                        }
                    ]
                }
            ]
        }
    ]
}
(base) [acampbell@nelson ~]$
AngCamp commented 2 years ago

From the above it looks like I am using the fastq-dump installed on our server from a previous version. Correct? Should it be version 3? Or does this look fine?

klymenko commented 2 years ago

What's the output of which fastq-dump ?

AngCamp commented 2 years ago
(base) [acampbell@nelson ~]$ which fastq-dump
~/apps/sratoolkit.3.0.0-ubuntu64/bin/fastq-dump
klymenko commented 2 years ago

Where/how did you get files in ~/apps/sratoolkit.3.0.0-ubuntu64/bin ? Exact commands?

AngCamp commented 2 years ago

I think the main issue I was having was setting the path. I originally put:export PATH=$PATH:/home/acampbell/apps/sratoolkit.3.0.0-ubuntu64/bin and I added that to the wrong .bashrc file.

When I ran nano ~/.bashrc and added PATH=/home/acampbell/apps/sratoolkit.3.0.0-ubuntu64/bin:$PATH that fixed the issue. Prior to doing this whihc fastq-dump showed the incorrect path.

Example of before:

(base) [acampbell@nelson ~]$ which fastq-dump
/space/bin/fastq-dump 

Current result:

(base) [acampbell@nelson ~]$ which fastq-dump
~/apps/sratoolkit.3.0.0-ubuntu64/bin/fastq-dump
klymenko commented 2 years ago

Check the exit code of fastq-dump. If it's 0 - then ignore error with https open messages.

AngCamp commented 2 years ago
(base) [acampbell@nelson ~]$ fastq-dump $?
2022-07-13T20:55:40 fastq-dump.3.0.0 err: name not found while resolving query within virtual file system module - failed to resolve accession '0' - Cannot resolve accession ( 404 )
2022-07-13T20:55:41 fastq-dump.3.0.0 err: name not found while resolving query within virtual file system module - failed to resolve accession '0' - Cannot resolve accession ( 404 )
2022-07-13T20:55:41 fastq-dump.3.0.0 err: item not found while constructing within virtual database module - the path '0' cannot be opened as database or table
fastq-dump quit with error code 3
klymenko commented 2 years ago

~/apps/sratoolkit.3.0.0-ubuntu64/bin/fastq-dump --stdout SRR390728 -X 3 echo $?

AngCamp commented 2 years ago

Hey sorry running fastq-dump $? interupted a loop I had running calling runs from an accession list. I need to restart it and wait for that to finish. I will run the above tomorrow and get back to you.

tbrunetti commented 2 years ago

I am having similar problems and I get the same configuration screen as the user who reported the issue has illustrated. Everytime I try to configure and save it and then run the fastq-dump --stdout SRR390728 | head -n 8 command, I get the following error:

This sra toolkit installation has not been configured.
Before continuing, please run: vdb-config --interactive
For more information, see https://www.ncbi.nlm.nih.gov/sra/docs/sra-cloud/

I have already configured it several times and it just keeps producing the same error. I am using the latest SRA version (sratoolkit.3.0.0-ubuntu64)

klymenko commented 2 years ago

To verify the version you use run vdb-config -V

tbrunetti commented 2 years ago

To verify the version you use run vdb-config -V

Thank you! I didn't realize there was a vdb-config binary bundled into sratools -- I was using the ubuntu repo which is out of date. Thanks!