williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
53 stars 25 forks source link

Failure to download gtf from newest Ensembl release #5

Closed slebedeva closed 7 years ago

slebedeva commented 7 years ago

Hi! I have an issue building reference with the newest Ensembl release. The IRFinder-BuildRefFromEnsembl fails to download the Ensembl gtf file. (Genome fasta file is ok).

Command:

path/to/my/bin/IRFinder -m BuildRefDownload -r REF/Human-hg38-release87 ftp://ftp.ensembl.org/pub/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh38.87.chr.gtf.gz

Output:

Usage : /path/to/my/bin/IRFinder/bin/util/IRFinder-BuildRefFromEnsembl mode threads STAR-executable base_ftp_url_of_ensembl_genome+gtf output_directory(must not exist) additional_genome_reference(eg: ERCC) non_polyA_genes-as-bed region_blacklist-as-bed
Usage example: /path/to/my/bin/IRFinder/bin/util/IRFinder-BuildRefFromEnsembl BuildRef 12 STAR "ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/" "IRFinder/REF/Human" "Refernce-ERCC.fa.gz" [non_polyA_genes.bed] [blacklist.bed]
Trying to fetch dna.primary_assembly and GTF based on:
ftp://ftp.ensembl.org/pub/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh38.87.chr.gtf.gz

Failed to download gtf.gz file.

After it fails, I still have the temporary directory inside REF/Human-hg38-release87 with four .gtf.gz files from Ensembl-87 release.

noahpieta commented 7 years ago

i have got the same problem. does anyone have solved it ? thanX!

slebedeva commented 7 years ago

I had to edit the original perl script (IRFinder-BuildRefFromEnsembl) because it only expects one .gtf file. line 112: replace system('wget',$base.'/gtf/'.$species.'/*.gtf.gz'); with system('wget',$base.'/gtf/'.$species.'/*'.$release.'.gtf.gz'); Or any other regex which makes your gtf file unique.

darogan commented 7 years ago

$release isn't specified elsewhere in the file so system('wget',$base.'/gtf/'.$species.'/*[0-9].gtf.gz'); worked for me.

Doesn't make any assumptions for the specific release number.

dg520 commented 7 years ago

Hi all,

Sorry for a late reply, we just released a new version of IRFinder 1.2.0. It has several major fixation and upgrades including solving this infamous automatic downloading problem.

Best, Dadi

dg520 commented 7 years ago

BTW, I recommend to change line 112 to system('wget',$hint); instead of using the regex, in case Ensembl was to add other similar file names in the future. This will look for and download the exact file name suggested by corresponding IRFinder argument. It is also the solution in version 1.2.0.