kundajelab / atac_dnase_pipelines

ATAC-seq and DNase-seq processing pipeline
BSD 3-Clause "New" or "Revised" License
161 stars 81 forks source link

"Filename too long" error at install_genome_data.sh for hg38 and mm10 (Linux) #52

Closed ghost closed 7 years ago

ghost commented 7 years ago

Hello, I really like the atac-seq pipeline. I's very simple to use and powerful.

When I tried installing genome on my Linux environment using the command line below.

bash install_genome_data.sh hg38 genome_data

I got this error: Downloading files... --2017-05-25 14:53:04-- https://www.encodeproject.org/files/GRCh38_no_alt_analysis_set_GCA_000001405.15/@@download/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz Resolving www.encodeproject.org... 171.67.205.70 Connecting to www.encodeproject.org|171.67.205.70|:443... connected. HTTP request sent, awaiting response... 307 Temporary Redirect Location: https://download.encodeproject.org/http://encode-files.s3.amazonaws.com/2015/12/03/a7fea375-057d-4cdc-8ccd-0b0f930823df/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz?Signature=E02noW181svRunJoZZcHmcEQPjA%3D&Expires=1495821185&AWSAccessKeyId=ASIAJLMR3YFBMR3W4R3A&response-content-disposition=attachment%3B%20filename%3DGRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz&x-amz-security-token=FQoDYXdzEOb//////////wEaDHROWrlCYmBhOEWJGCK3A8UWmqeOL6P7SOnqqhJpuaSIcuUcIkAuiK%2B3vJlCKQcWIUPnSMgxDWH%2B99i7uXzmUiniNucZHEdIoFQmbL/aqV%2B2j7fgW3Kb9iC0XoeTHt/vOXgjg36FTrWYW6UBKWTzAffK%2B49adT6TG68Ur7z2w17RdBnOpU2Soi4UV%2BlFvZvCKpY%2BcA5yepMFgtV7dcypM3Ncb3A52NXJtADK8n3WuPha8f4RiafqT0C1vKs9oyjvLhRUXNUEiccUGwS5BCNQW5NhkULMm2hW9mH8Yckm2JjYNtjNDmxSMzllqAR7tk0VQHWqwWLYZ3HgyjzGnrlHu0M/pLMY9XTnigzZbT5qtfofxyIwsixA7ahU6HkjKN/Slzs8O059fm8KaRCt4WRHVgCINzijjdqcOtLCrydsvsgwGbjiJNKjiuaZjFtlgJxI1vyvDW0BgIR6A3/ii8nfvT9CN2shy929HzkYPgUorkRuO8pAxdyZavyDmFUZotJ/TVgsvBCN2V5S/B/YaZwif3osTMVQpNcm8QzY8gOz4qkbaitx14EPCL3KICtCwX6yA77xWmEMrjKV7l7/rklWnQxYDPE1GO4ohL%2BZyQU%3D [following] --2017-05-25 14:53:05-- https://download.encodeproject.org/http://encode-files.s3.amazonaws.com/2015/12/03/a7fea375-057d-4cdc-8ccd-0b0f930823df/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz?Signature=E02noW181svRunJoZZcHmcEQPjA%3D&Expires=1495821185&AWSAccessKeyId=ASIAJLMR3YFBMR3W4R3A&response-content-disposition=attachment%3B%20filename%3DGRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz&x-amz-security-token=FQoDYXdzEOb//////////wEaDHROWrlCYmBhOEWJGCK3A8UWmqeOL6P7SOnqqhJpuaSIcuUcIkAuiK%2B3vJlCKQcWIUPnSMgxDWH%2B99i7uXzmUiniNucZHEdIoFQmbL/aqV%2B2j7fgW3Kb9iC0XoeTHt/vOXgjg36FTrWYW6UBKWTzAffK%2B49adT6TG68Ur7z2w17RdBnOpU2Soi4UV%2BlFvZvCKpY%2BcA5yepMFgtV7dcypM3Ncb3A52NXJtADK8n3WuPha8f4RiafqT0C1vKs9oyjvLhRUXNUEiccUGwS5BCNQW5NhkULMm2hW9mH8Yckm2JjYNtjNDmxSMzllqAR7tk0VQHWqwWLYZ3HgyjzGnrlHu0M/pLMY9XTnigzZbT5qtfofxyIwsixA7ahU6HkjKN/Slzs8O059fm8KaRCt4WRHVgCINzijjdqcOtLCrydsvsgwGbjiJNKjiuaZjFtlgJxI1vyvDW0BgIR6A3/ii8nfvT9CN2shy929HzkYPgUorkRuO8pAxdyZavyDmFUZotJ/TVgsvBCN2V5S/B/YaZwif3osTMVQpNcm8QzY8gOz4qkbaitx14EPCL3KICtCwX6yA77xWmEMrjKV7l7/rklWnQxYDPE1GO4ohL%2BZyQU%3D Resolving download.encodeproject.org... 171.67.205.70 Connecting to download.encodeproject.org|171.67.205.70|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 872949833 (833M) [binary/octet-stream] GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz?Signature=E02noW181svRunJoZZcHmcEQPjA=&Expires=1495821185&AWSAccessKeyId=ASIAJLMR3YFBMR3W4R3A&response-content-disposition=attachment; filename=GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz&x-amz-security-token=FQoDYXdzEOb%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaDHROWrlCYmBhOEWJGCK3A8UWmqeOL6P7SOnqqhJpuaSIcuUcIkAuiK+3vJlCKQcWIUPnSMgxDWH+99i7uXzmUiniNucZHEdIoFQmbL%2FaqV+2j7fgW3Kb9iC0XoeTHt%2FvOXgjg36FTrWYW6UBKWTzAffK+49adT6TG68Ur7z2w17RdBnOpU2Soi4UV+lFvZvCKpY+cA5yepMFgtV7dcypM3Ncb3A52NXJtADK8n3WuPha8f4RiafqT0C1vKs9oyjvLhRUXNUEiccUGwS5BCNQW5NhkULMm2hW9mH8Yckm2JjYNtjNDmxSMzllqAR7tk0VQHWqwWLYZ3HgyjzGnrlHu0M%2FpLMY9XTnigzZbT5qtfofxyIwsixA7ahU6HkjKN%2FSlzs8O059fm8KaRCt4WRHVgCINzijjdqcOtLCrydsvsgwGbjiJNKjiuaZjFtlgJxI1vyvDW0BgIR6A3%2Fii8nfvT9CN2shy929HzkYPgUorkRuO8pAxdyZavyDmFUZotJ%2FTVgsvBCN2V5S%2FB%2FYaZwif3osTMVQpNcm8QzY8gOz4qkbaitx14EPCL3KICtCwX6yA77xWmEMrjKV7l7%2FrklWnQxYDPE1GO4ohL+ZyQU=: File name too long

Cannot write to “GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz?Signature=E02noW181svRunJoZZcHmcEQPjA=&Expires=1495821185&AWSAccessKeyId=ASIAJLMR3YFBMR3W4R3A&response-content-disposition=attachment; filename=GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz&x-amz-security-token=FQoDYXdzEOb%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaDHROWrlCYmBhOEWJGCK3A8UWmqeOL6P7SOnqqhJpuaSIcuUcIkAuiK+3vJlCKQcWIUPnSMgxDWH+99i7uXzmUiniNucZHEdIoFQmbL%2FaqV+2j7fgW3Kb9iC0XoeTHt%2FvOXgjg36FTrWYW6UBKWTzAffK+49adT6TG68Ur7z2w17RdBnOpU2Soi4UV+lFvZvCKpY+cA5yepMFgtV7dcypM3Ncb3A52NXJtADK8n3WuPha8f4RiafqT0C1vKs9oyjvLhRUXNUEiccUGwS5BCNQW5NhkULMm2hW9mH8Yckm2JjYNtjNDmxSMzllqAR7tk0VQHWqwWLYZ3HgyjzGnrlHu0M%2FpLMY9XTnigzZbT5qtfofxyIwsixA7ahU6HkjKN%2FSlzs8O059fm8KaRCt4WRHVgCINzijjdqcOtLCrydsvsgwGbjiJNKjiuaZjFtlgJxI1vyvDW0BgIR6A3%2Fii8nfvT9CN2shy929HzkYPgUorkRuO8pAxdyZavyDmFUZotJ%2FTVgsvBCN2V5S%2FB%2FYaZwif3osTMVQpNcm8QzY8gOz4qkbaitx14EPCL3KICtCwX6yA77xWmEMrjKV7l7%2FrklWnQxYDPE1GO4ohL+ZyQU=” (Success).

Script works fine for hg19 and mm9, but not the latest genomes. This is because download filename becomes too long with extra text after "?".

wget -N -c https://www.encodeproject.org/files/GRCh38_no_alt_analysis_set_GCA_000001405.15/@@download/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz

GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz?Signature=fm1c+eBHeNHujUpHMgBLPsh2SRs=&Expires=1495821279&AWSAccessKeyId=ASIAJLMR3YFBMR3W4R3A&response-content-disposition=attachment; filename=GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta.gz&x-amz-security-token=FQoDYXdzEOb%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaDHROWrlCYmBhOEWJGCK3A8UWmqeOL6P7SOnqqhJpuaSIcuUcIkAuiK+3vJlCKQcWIUPnSMgxDWH+99i7uXzmUiniNucZHEdIoFQmbL%2FaqV+2j7fgW3Kb9iC0XoeTHt%2FvOXgjg36FTrWYW6UBKWTzAffK+49adT6TG68Ur7z2w17RdBnOpU2Soi4UV+lFvZvCKpY+cA5yepMFgtV7dcypM3Ncb3A52NXJtADK8n3WuPha8f4RiafqT0C1vKs9oyjvLhRUXNUEiccUGwS5BCNQW5NhkULMm2hW9mH8Yckm2JjYNtjNDmxSMzllqAR7tk0VQHWqwWLYZ3HgyjzGnrlHu0M%2FpLMY9XTnigzZbT5qtfofxyIwsixA7ahU6HkjKN%2FSlzs8O059fm8KaRCt4WRHVgCINzijjdqcOtLCrydsvsgwGbjiJNKjiuaZjFtlgJxI1vyvDW0BgIR6A3%2Fii8nfvT9CN2shy929HzkYPgUorkRuO8pAxdyZavyDmFUZotJ%2FTVgsvBCN2V5S%2FB%2FYaZwif3osTMVQpNcm8QzY8gOz4qkbaitx14EPCL3KICtCwX6yA77xWmEMrjKV7l7%2FrklWnQxYDPE1GO4ohL+ZyQU=: File name too long

This error can be avoided by specifying output filename with -O option:

name=basename ${REF_FA} wget -N -c -O $name ${REF_FA}

leepc12 commented 7 years ago

Fixed, thanks for reporting this. BTW -N and -O are mutually exclusive. So I removed the timestamping flag -N.