Closed ian-bda closed 8 months ago
@ian-bda
IRFinder -m BuildRef
. But your command missed -m
. So, I have to assume you are using an adapted version. Please note, a) I can only provide advice based on the official version, as I don't know what else has been changed in a customized version, and b) I am more than happy to look into whatever in the official version that does not work in an expected way. While I may provide suggestions for customization case by case, I won't guarantee they will always work, and I won't dig down why they don't work. IRFinder -m BuildRef
, it expected a valid FTP address to an existing GTF file on the ENSEMBL server. Your input, ftp://ftp.ensembl.org/pub/release-111/fasta/danio_rerio/dna/Danio_rerio.GRCz11.111.gtf.gz
does NOT exist. The existing one is: ftp://ftp.ensembl.org/pub/release-111/gtf/danio_rerio/Danio_rerio.GRCz11.111.gtf.gz
. I think you messed up the gtf
and fastq
folders on the FTP site. Hi @dg520
Thanks for your quick response. No idea how I ended up with a custom version of IRFinder. Just re-downloaded it to get the correct version and changed the URL to the correct one. Here is my new script:
#!/bin/bash
/home5/ibirchl/IRFinder-1.3.0/bin/IRFinder -m BuildRef -r REF/Zebrafish-GRCz11-release111 \
ftp://ftp.ensembl.org/pub/release-111/gtf/danio_rerio/Danio_rerio.GRCz11.111.gtf.gz
Unfortunately I am still getting the following error:
Launching reference build process. The full build might take hours.
Trying to fetch dna.primary_assembly and GTF based on:
ftp://ftp.ensembl.org/pub/release-111/gtf/danio_rerio/Danio_rerio.GRCz11.111.gtf.gz
Warning: wildcards not supported in HTTP.
--2024-03-08 14:00:32-- ftp://ftp.ensembl.org/pub/release-111/fasta/danio_rerio/dna/*.dna.primary_assembly.fa.gz
Connecting to 192.168.1.20:3128... connected.
Proxy request sent, awaiting response... 404 Not Found
2024-03-08 14:00:36 ERROR 404: Not Found.
Warning: wildcards not supported in HTTP.
--2024-03-08 14:00:36-- ftp://ftp.ensembl.org/pub/release-111/fasta/danio_rerio/dna/*.dna.toplevel.fa.gz
Connecting to 192.168.1.20:3128... connected.
Proxy request sent, awaiting response... 404 Not Found
2024-03-08 14:00:38 ERROR 404: Not Found.
Failed to download fa.gz file.
@ian-bda It works on my end. See the command and messages below:
(base) TESTMACHINE:~$ IRFinder -m BuildRef -r test_ref ftp://ftp.ensembl.org/pub/release-111/gtf/danio_rerio/Danio_rerio.GRCz11.111.gtf.gz
Launching reference build process. The full build might take hours.
Trying to fetch dna.primary_assembly and GTF based on:
ftp://ftp.ensembl.org/pub/release-111/gtf/danio_rerio/Danio_rerio.GRCz11.111.gtf.gz
--2024-03-08 13:27:12-- ftp://ftp.ensembl.org/pub/release-111/fasta/danio_rerio/dna/*.dna.primary_assembly.fa.gz
=> '.listing'
Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.193.169
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.169|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/release-111/fasta/danio_rerio/dna ... done.
==> PASV ... done. ==> LIST ... done.
.listing [ <=> ] 9.16K --.-KB/s in 0.009s
2024-03-08 13:27:13 (1014 KB/s) - '.listing' saved [9379]
Removed '.listing'.
--2024-03-08 13:27:13-- ftp://ftp.ensembl.org/pub/release-111/fasta/danio_rerio/dna/Danio_rerio.GRCz11.dna.primary_assembly.fa.gz
=> 'Danio_rerio.GRCz11.dna.primary_assembly.fa.gz'
==> CWD not required.
==> PASV ... done. ==> RETR Danio_rerio.GRCz11.dna.primary_assembly.fa.gz ... done.
Length: 410230731 (391M)
Danio_rerio.GRCz11.dna.primar 100%[=================================================>] 391.23M 13.0MB/s in 32s
2024-03-08 13:27:45 (12.2 MB/s) - 'Danio_rerio.GRCz11.dna.primary_assembly.fa.gz' saved [410230731]
--2024-03-08 13:27:45-- ftp://ftp.ensembl.org/pub/release-111/gtf/danio_rerio/Danio_rerio.GRCz11.111.gtf.gz
=> 'Danio_rerio.GRCz11.111.gtf.gz'
Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.193.169
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.169|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/release-111/gtf/danio_rerio ... done.
==> SIZE Danio_rerio.GRCz11.111.gtf.gz ... 18347398
==> PASV ... done. ==> RETR Danio_rerio.GRCz11.111.gtf.gz ... done.
Length: 18347398 (17M) (unauthoritative)
Danio_rerio.GRCz11.111.gtf.gz 100%[=================================================>] 17.50M 10.5MB/s in 1.7s
2024-03-08 13:27:48 (10.5 MB/s) - 'Danio_rerio.GRCz11.111.gtf.gz' saved [18347398]
<Phase 1: STAR Reference Preparation>
Mar 08 13:27:59 ..... started STAR run
Mar 08 13:27:59 ... starting to generate Genome files
One possible issue is that your machine does not fully support FTP or HTTP. To rule out this, could you please run the following wget
command and see if you can download the GTF file successfully?
wget ftp://ftp.ensembl.org/pub/release-111/gtf/danio_rerio/Danio_rerio.GRCz11.111.gtf.gz
Let me know.
@dg520 I ran wget ftp://ftp.ensembl.org/pub/release-111/gtf/danio_rerio/Danio_rerio.GRCz11.111.gtf.gz
and it worked:
--2024-03-08 14:45:48-- ftp://ftp.ensembl.org/pub/release-111/gtf/danio_rerio/Danio_rerio.GRCz11.111.gtf.gz
=> 'Danio_rerio.GRCz11.111.gtf.gz.'
Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.193.169
Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.169|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done. ==> PWD ... done.
==> TYPE I ... done. ==> CWD (1) /pub/release-111/gtf/danio_rerio ... done.
==> SIZE Danio_rerio.GRCz11.111.gtf.gz ... 18347398
==> PASV ... done. ==> RETR Danio_rerio.GRCz11.111.gtf.gz ... done.
Length: 18347398 (17M) (unauthoritative)
Danio_rerio.GRCz11.111.gtf. 100%[==========================================>] 17.50M 112KB/s in 3m 59s
2024-03-08 14:49:49 (74.9 KB/s) - 'Danio_rerio.GRCz11.111.gtf.gz.' saved [18347398]
Also tried rerunning the command exactly as you wrote it and am still getting the same error.
#!/bin/bash
/home5/ibirchl/IRFinder-1.3.0/bin/IRFinder -m BuildRef -r test_ref ftp://ftp.ensembl.org/pub/release-111/gtf/danio_rerio/Danio_rerio.GRCz11.111.gtf.gz
Launching reference build process. The full build might take hours.
Trying to fetch dna.primary_assembly and GTF based on:
ftp://ftp.ensembl.org/pub/release-111/gtf/danio_rerio/Danio_rerio.GRCz11.111.gtf.gz
Warning: wildcards not supported in HTTP.
--2024-03-08 14:53:08-- ftp://ftp.ensembl.org/pub/release-111/fasta/danio_rerio/dna/*.dna.primary_assembly.fa.gz
Connecting to 192.168.1.20:3128... connected.
Proxy request sent, awaiting response... 404 Not Found
2024-03-08 14:53:10 ERROR 404: Not Found.
Warning: wildcards not supported in HTTP.
--2024-03-08 14:53:10-- ftp://ftp.ensembl.org/pub/release-111/fasta/danio_rerio/dna/*.dna.toplevel.fa.gz
Connecting to 192.168.1.20:3128... connected.
Proxy request sent, awaiting response... 404 Not Found
2024-03-08 14:53:12 ERROR 404: Not Found.
Failed to download fa.gz file.
@ian-bda
This tells FTP is supported, which is good. But wildcards in the address are not (e.g., ftp://test/*.fa
). To make wildcards supported, you will have to consult and work with the IT admins who configure the machine you're working on.
Meanwhile, there is a workaround here and see if you want to adapt it for your workflow. Basically you need to run the follow:
mkdir REF/Zebrafish-GRCz11-release111 #This is the IRFinder reference folder you will stick to. Feel free to change it to other locations
cd REF/Zebrafish-GRCz11-release111
wget ftp://ftp.ensembl.org/pub/release-111/gtf/danio_rerio/Danio_rerio.GRCz11.111.gtf.gz
wget ftp://ftp.ensembl.org/pub/release-111/fasta/danio_rerio/dna/Danio_rerio.GRCz11.dna.primary_assembly.fa.gz
gunzip Danio_rerio.GRCz11.111.gtf.gz
gunzip Danio_rerio.GRCz11.dna.primary_assembly.fa.gz
mv Danio_rerio.GRCz11.111.gtf transcripts.gtf
mv Danio_rerio.GRCz11.dna.primary_assembly.fa genome.fa
cd ../../
/home5/ibirchl/IRFinder-1.3.0/bin/IRFinder -m BuildRefProcess -r REF/Zebrafish-GRCz11-release111
Once the building process is completed, you can remove transcripts.gtf
and genome.fa
under REF/Zebrafish-GRCz11-release111
to save disk space.
Hi I am trying to run the following command:
but it keeps giving me the error:
Its probably just a simple formatting issue I'm missing but any help is greatly appreciated! Thanks