williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
53 stars 25 forks source link

BuildRefFromEnsembl: No such file or directory #32

Closed Alan1389 closed 6 years ago

Alan1389 commented 6 years ago

Hi Dadi,

I'm running GCC 5.5 on a mac Sierra 10.12, and I'm having trouble building the IRFinder human genome reference. It can't seem to find the BuildRefFromEnsembl executable even though the file is there...

Below is the command and the error messages:

Alans-MacBook-Pro:IRFinder-1.2.3 alanjiao1$ bin/IRFinder -m BuildRef -r REF/Human-hg19-release75 -e REF/extra-input-files/RNA.SpikeIn.ERCC.fasta.gz -b REF/extra-input-files/Human_hg19_wgEncodeDacMapabilityConsensusExcludable.bed.gz -R REF/extra-input-files/Human_hg19_nonPolyA_ROI.bed ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz

readlink: illegal option -- f usage: readlink [-n] [file ...] bin/IRFinder: line 192: /proc/meminfo: No such file or directory bin/IRFinder: line 193: /1000: syntax error: operand expected (error token is "/1000") bin/IRFinder: line 197: [: -lt: unary operator expected bin/IRFinder: line 212: /proc/cpuinfo: No such file or directory grep: /proc/cpuinfo: No such file or directory Launching reference build process. The full build should take at least one hour. bin/IRFinder: line 234: ./util/IRFinder-BuildRefFromEnsembl: No such file or directory

Any help would be greatly appreciated. Thank you so much!

Alan

dg520 commented 6 years ago

Hi Alan,

This is wired. Seems something is wrong with symlink setting but I cannot tell what to be exactly. Could you please cd into bin directory of IRFinder and try to call ./IRFinder -m BuildRef -r REF/Human-hg19-release75 -e REF/extra-input-files/RNA.SpikeIn.ERCC.fasta.gz -b REF/extra-input-files/Human_hg19_wgEncodeDacMapabilityConsensusExcludable.bed.gz -R REF/extra-input-files/Human_hg19_nonPolyA_ROI.bed ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz ?

And seems you don't have regular files /proc/cpuinfo and /proc/meminfo or you do not have the permission to read those files.

I notice IRFinder might encounter various problems on some MAC system. At the moment, I strongly recommend using a Linux platform instead.

Best wishes and happy new year, Dadi

Alan1389 commented 6 years ago

Hi Dadi,

Thank you so much for your help! I don't have access to a Linux platform, but after cd into bin directory and specifying -t 4 (to get around it failing to find cpuinfo) we successfully generated the IRFinder reference.

However, we encountered the following error when trying to run IRFinder giving it a single BAM file (it was sorted, but we reverted it using RevertSam).

Alans-MacBook-Pro:bin alanjiao1$ ./IRFinder -t 4 -m BAM -r ../REF/Human-hg19-release75 -d /Volumes/Seagate_Backup_Plus_Drive/AJ1 /Users/alanjiao1/Google_Drive/Slack_Lab/RNA-SEQ/sorted_bams/AJ1_reverted.bam

readlink: illegal option -- f usage: readlink [-n] [file ...] ./IRFinder: line 192: /proc/meminfo: No such file or directory ./IRFinder: line 193: /1000: syntax error: operand expected (error token is "/1000") ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. ERROR: IRFinder appears not to have completed. It appears an unknown component crashed.

Do you think this has to do with our input bam file?

Thank you so much, and Happy New Year!

Alan

dg520 commented 6 years ago

Hi Alan,

The error message ./IRFinder: line 192: /proc/meminfo: No such file or directory concerns me. I think you should have encountered the same error during genome preparation. However, seems you got around it. I'm not sure how.

Could you please make sure STAR genome is successfully built by checking the log file under ../REF/Human-hg19-release75/logSTARBuild and check all files under ../REF/Human-hg19-release75/Mapability and ../REF/Human-hg19-release75/IRFinder are non-empty. Please also attach the std output on your screen.

If you believe everything is OK, you might want to open the executable file IRFinder and modify line 192 with the following (and then save the file):

MEMK= 32000000

The number 32000000 is the minimal requirement to run IRFinder (otherwise you cannot use IRFinder). You should replace it with the number for your own usage depending on your computer or cluster setup (and the amount you are allowed to use if you are not the admin of the system).

Best, Dadi

Alan1389 commented 6 years ago

Hi Dadi,

Thank you very much for your responses. I'm really sorry - I made a mistake in my earlier message and there actually was a problem with the STAR genome build. It seems the genomeParamaters.txt file is missing. Here is the command and whole the output:

./IRFinder -t 4 -m BuildRef -r ../REF/Human-hg19-release75 -e ../REF/extra-input-files/RNA.SpikeIn.ERCC.fasta.gz -b ../REF/extra-input-files/Human_hg19_wgEncodeDacMapabilityConsensusExcludable.bed.gz -R ../REF/extra-input-files/Human_hg19_nonPolyA_ROI.bed ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz readlink: illegal option -- f usage: readlink [-n] [file ...] ./IRFinder: line 192: /proc/meminfo: No such file or directory ./IRFinder: line 193: /1000: syntax error: operand expected (error token is "/1000") ./IRFinder: line 197: [: -lt: unary operator expected Launching reference build process. The full build should take at least one hour. Usage : ./util/IRFinder-BuildRefFromEnsembl mode threads STAR-executable base_ftp_url_of_ensembl_genome+gtf output_directory(must not exist) additional_genome_reference(eg: ERCC) non_polyA_genes-as-bed region_blacklist-as-bed Usage example: ./util/IRFinder-BuildRefFromEnsembl BuildRef 12 STAR "ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/" "IRFinder/REF/Human" "Refernce-ERCC.fa.gz" [non_polyA_genes.bed] [blacklist.bed] Trying to fetch dna.primary_assembly and GTF based on: ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz

--2017-12-31 16:11:31-- ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/*.dna.primary_assembly.fa.gz => '.listing' Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.193.8 Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.8|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/release-75/fasta/homo_sapiens/dna ... done. ==> PASV ... done. ==> LIST ... done.

.listing [ <=> ] 81.94K 221KB/s in 0.4s

2017-12-31 16:11:33 (221 KB/s) - '.listing' saved [83903]

Removed '.listing'. --2017-12-31 16:11:33-- ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz => 'Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz' ==> CWD not required. ==> PASV ... done. ==> RETR Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz ... done. Length: 869930767 (830M)

Homo_sapiens.GRCh37.75.dna.pri 100%[=================================================>] 829.63M 9.37MB/s in 1m 40s

2017-12-31 16:13:13 (8.31 MB/s) - 'Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz' saved [869930767]

--2017-12-31 16:13:13-- ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz => 'Homo_sapiens.GRCh37.75.gtf.gz' Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.193.8 Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.8|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/release-75/gtf/homo_sapiens ... done. ==> SIZE Homo_sapiens.GRCh37.75.gtf.gz ... 39344043 ==> PASV ... done. ==> RETR Homo_sapiens.GRCh37.75.gtf.gz ... done. Length: 39344043 (38M) (unauthoritative)

Homo_sapiens.GRCh37.75.gtf.gz 100%[=================================================>] 37.52M 7.38MB/s in 15s

2017-12-31 16:13:30 (2.46 MB/s) - 'Homo_sapiens.GRCh37.75.gtf.gz' saved [39344043]

Dec 31 16:17:26 ..... started STAR run Dec 31 16:17:26 ... starting to generate Genome files Dec 31 16:18:23 ... starting to sort Suffix Array. This may take a long time... Dec 31 16:18:41 ... sorting Suffix Array chunks and saving them to disk... Dec 31 16:57:50 ... loading chunks from disk, packing SA... Dec 31 17:01:32 ... finished generating suffix array Dec 31 17:01:32 ... generating Suffix Array index Dec 31 17:11:21 ... completed Suffix Array index Dec 31 17:11:21 ..... processing annotations GTF Dec 31 17:11:39 ..... inserting junctions into the genome indices Star genome build result: 9 /Users/alanjiao1/IRFinder-1.2.3/bin/util/Mapability: line 3: ulimit: max user processes: cannot modify limit: Invalid argument Commence STAR mapping run for mapability. Sun Dec 31 17:26:16 EST 2017

EXITING because of FATAL ERROR: could not open genome file /Users/alanjiao1/IRFinder-1.2.3/bin/../REF/Human-hg19-release75/STAR/genomeParameters.txt SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions

Dec 31 17:26:16 ...... FATAL ERROR, exiting

real 0m0.041s user 0m0.004s sys 0m0.007s Completed STAR run. Sun Dec 31 17:26:16 EST 2017 Commence Coverage calculation. ls: tmp_by_chr_40987/*.bed.gz: No such file or directory xargs: illegal option -- - usage: xargs [-0opt] [-E eofstr] [-I replstr [-R replacements]] [-J replstr] [-L number] [-n number [-x]] [-P maxprocs] [-s size] [utility [argument ...]]

real 0m0.005s user 0m0.002s sys 0m0.002s cat: tmp_by_chr_40987/*.exclusion: No such file or directory

real 0m0.006s user 0m0.004s sys 0m0.003s rm: tmp_by_chr_40987/bed.gz.exclusion: No such file or directory rm: tmp_by_chr_40987/bed.gz: No such file or directory Completed coverage exclusion calculation. Sun Dec 31 17:26:16 EST 2017 Mapability result: 0 Build Ref 1 Build Ref 2 Build Ref 3 Build Ref 4 Build Ref 5 Build Ref 6 Build Ref 7 Build Ref 8 Build Ref 9 Build Ref 10 Build Ref 11 Build Ref 12 Build Ref 13b Build Ref 14b Build Ref 15b Build Ref 16 - COMPLETE Ref build result: 0 ALL DONE