williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
53 stars 25 forks source link

Quant step error. Seems that looks for compressed bam #61

Open jmelero611 opened 5 years ago

jmelero611 commented 5 years ago

Hi,

I am running IRfinder with the following command:

IRFinder -m BAM -r REF/Human-hg38-v27 -d output/SRR1293044_irfinder PATH_TO_BAM/SRR1293044Aligned.out.bam

I receive the following error message:

gzip: /bicoh/LEUKEMIAS/TARGET_ALL_phaseII/star/SRR1293044Aligned.out.bam.gz: No such file or directory /soft/EB_repo/bio/sequence/programs/noarch/IRFinder/1.2.3/bin/IRFinder: line 561: 5100 Exit 1 gzip -cd "$1" 5101 Aborted | "$LIBEXEC/irfinder" "$OUTPUTDIR" "$REF/IRFinder/ref-cover.bed" "$REF/IRFinder/ref-sj.ref" "$REF/IRFinder/ref-read-continues.ref" "$REF/IRFinder/ref-ROI.bed" "$OUTPUTDIR/unsorted.frag.bam" >> "$OUTPUTDIR/irfinder.stdout" 2>> "$OUTPUTDIR/irfinder.stderr" ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. ERROR: IRFinder appears not to have completed. It appears an unknown component crashed.

It is like, although I put the bam file, it is expected to use bam.gz

Reference process have been completed with some warnings:

Launching reference build process. The full build should take at least one hour. Usage : /soft/EB_repo/bio/sequence/programs/noarch/IRFinder/1.2.3/bin/util/IRFinder-BuildRefFromEnsembl mode threads STAR-executable base_ftp_url_of_ensembl_genome+gtf output_directory(must not exist) additional_genome_reference(eg: ERCC) non_polyA_genes-as-bed region_blacklist-as-bed Usage example: /soft/EB_repo/bio/sequence/programs/noarch/IRFinder/1.2.3/bin/util/IRFinder-BuildRefFromEnsembl BuildRef 12 STAR "ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/" "IRFinder/REF/Human" "Refernce-ERCC.fa.gz" [non_polyA_genes.bed] [blacklist.bed] May 28 18:53:25 ..... started STAR run May 28 18:53:26 ... starting to generate Genome files May 28 18:55:32 ... starting to sort Suffix Array. This may take a long time... May 28 18:56:28 ... sorting Suffix Array chunks and saving them to disk... May 29 03:02:16 ... loading chunks from disk, packing SA... May 29 03:13:20 ... finished generating suffix array May 29 03:13:20 ... generating Suffix Array index May 29 03:20:46 ... completed Suffix Array index May 29 03:20:46 ..... processing annotations GTF May 29 03:21:28 ..... inserting junctions into the genome indices May 29 03:47:18 ... writing Genome to disk ... May 29 03:47:47 ... writing Suffix Array to disk ... May 29 03:51:34 ... writing SAindex to disk May 29 03:51:53 ..... finished successfully Star genome build result: 0 Commence STAR mapping run for mapability. Wed May 29 03:51:59 CEST 2019

real 538m48.869s user 808m45.528s sys 201m1.437s Completed STAR run. Wed May 29 12:50:48 CEST 2019 Commence Coverage calculation.

real 316m41.359s user 314m45.915s sys 308m35.405s

real 0m10.477s user 0m7.513s sys 0m0.972s Completed coverage exclusion calculation. Wed May 29 18:08:05 CEST 2019 Mapability result: 0 Build Ref 1 Build Ref 2 Build Ref 3 Build Ref 4 ***** WARNING: File /dev/fd/63 has inconsistent naming convention for record: GL000008.2 0 40 X 0 +

***** WARNING: File /dev/fd/63 has inconsistent naming convention for record: GL000008.2 0 40 X 0 +

Build Ref 5 ***** WARNING: File /dev/fd/63 has inconsistent naming convention for record: GL000008.2 0 40 X 0 +

***** WARNING: File /dev/fd/63 has inconsistent naming convention for record: GL000008.2 0 40 X 0 +

Build Ref 6 Build Ref 7 Build Ref 8 Build Ref 9 Build Ref 10 Build Ref 11 Build Ref 12 Build Ref 13c Build Ref 14c Build Ref 16 - COMPLETE Ref build result: 0 ALL DONE

However, some of the IR quantification processes have been completed successfully (I ran several in parallel), but most of the processes have this error.

Thank you very much for your help!

Best regards, Juan Luis

dg520 commented 5 years ago

Hi Juan,

This is a C error. What version of GCC are you using? IRFinder requires C++ 11 features which are NOT included for GCC older than 4.9.0. Even your GCC was newer than that, some Linux distros such as RedHat miss some C libraries by default. First make sure your GCC is >=4.9.0 and then the workaround is to re-compile the IRFinder core from source at src/irfinder against your own system environment. And Finally replace the irfinder under bin/util with your newly compiled one.

If your GCC was compatible and you still encountered errors during compilation, that indicates you're missing some C libraries. If you were the system admin, install those missing libraries should make things roll. Otherwise you have to enquire your system admin for the required libraries.

Let me know. Thanks.

P.S. Did you mention bam.gz because you saw gzip in the implementation? That's not the case. Actually BAM file itself is a zipped format, it contains multiple gzipped blocks inside. That's why here we need something like gzip to read it.

Best, Dadi

jmelero611 commented 5 years ago

Hi Dadi,

thank you for your answer. I asked the system admins to implement your indications. When they tried to compile the program, this error appeared:

[manager@easybuild 1.2.3]$ ls bin LICENSE README.md REF src [manager@easybuild 1.2.3]$ gcc --version gcc (GCC) 5.4.0 Copyright (C) 2015 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[manager@easybuild 1.2.3]$ cd src/irfinder/ [manager@easybuild irfinder]$ make clean rm -f *.o irfinder Depend.list [manager@easybuild irfinder]$ make Makefile:32: Depend.list: No such file or directory /bin/rm -f ./Depend.list g++ -pipe -std=c++0x -O3 -Wall -Wextra -fopenmp -D'COMPILATION_TIME_PLACE="Wed Jun 5 11:34:30 CEST 2019 easybuild.prib.upf.edu:/soft/EB_repo/bio/sequence/programs/noarch/IRFinder/1.2.3/src/irfinder"' -MM ReadBlockProcessor.cpp FragmentBlocks.cpp IRFinder.cpp crc32.cpp ReadBlockProcessor_OutputBAM.cpp CoverageBlock.cpp ReadBlockProcessor_CoverageBlocks.cpp BAM2blocks.cpp >> Depend.list g++ -c -pipe -std=c++0x -O3 -Wall -Wextra -fopenmp -D'COMPILATION_TIME_PLACE="Wed Jun 5 11:34:31 CEST 2019 easybuild.prib.upf.edu:/soft/EB_repo/bio/sequence/programs/noarch/IRFinder/1.2.3/src/irfinder"' FragmentBlocks.cpp g++ -c -pipe -std=c++0x -O3 -Wall -Wextra -fopenmp -D'COMPILATION_TIME_PLACE="Wed Jun 5 11:34:31 CEST 2019 easybuild.prib.upf.edu:/soft/EB_repo/bio/sequence/programs/noarch/IRFinder/1.2.3/src/irfinder"' ReadBlockProcessor.cpp g++ -c -pipe -std=c++0x -O3 -Wall -Wextra -fopenmp -D'COMPILATION_TIME_PLACE="Wed Jun 5 11:34:31 CEST 2019 easybuild.prib.upf.edu:/soft/EB_repo/bio/sequence/programs/noarch/IRFinder/1.2.3/src/irfinder"' CoverageBlock.cpp g++ -c -pipe -std=c++0x -O3 -Wall -Wextra -fopenmp -D'COMPILATION_TIME_PLACE="Wed Jun 5 11:34:31 CEST 2019 easybuild.prib.upf.edu:/soft/EB_repo/bio/sequence/programs/noarch/IRFinder/1.2.3/src/irfinder"' ReadBlockProcessor_CoverageBlocks.cpp ReadBlockProcessor_CoverageBlocks.cpp: In member function ‘double CoverageBlocks::percentileFromHist(const std::map<unsigned int, unsigned int>&, uint) const’: ReadBlockProcessor_CoverageBlocks.cpp:234:9: error: ‘NAN’ was not declared in this scope return NAN; ^ make: *** [ReadBlockProcessor_CoverageBlocks.o] Error 1 [manager@easybuild irfinder]$

It appears in version 1.2.3. (the one we used) and with the new version 1.2.5. There is no information about how to compile the program.

Apart from that error, we tried to load GCC-5.4.0 together with IRFinder, so C libraries should be loaded as well. Curiously, in some cases the program runs perfectly and in other cases the error I wrote in the first post appears (I send the cases to a cluster).

Is there anything wrong in the compilation process? Should there be the error with GCC-5.4.0 or is it anything else?

Thank you for your support.

Best, Juan Luis

dg520 commented 5 years ago

Hi Juan,

Sorry I just saw this feedback.

As you mentioned you can finish IRFinder successfully using GCC-5.4.0 but some failed. This makes me re-think if the failure is due to C error or a simple I/O error makes C stop executing. Things to check: 1) Were successful ones and failed ones supposed to be saved to the exactly same directory? 2) Do you have sufficient permission level to write files to the target folders of those failed ones? 3) Are you limited by memory usage or disk space? In terms of sending to a cluster, your admin usually pre-sets the maximum computational resources you can use for each job. Make sure both your required allocation and the maximum limitation meet the need for your jobs.

I suggest you to check the above first. If no luck, keep reading the following: The compiling error your admin encountered suggests your compiler doesn't understand NAN. Your compiler requires the value of NAN to be pre-defined somewhere in the source code. Actually, this is a strong indiction that the compiler you're using doesn't fully support C++ 11 features. This is because NAN definition is one of those new features in C++11.

As I also mentioned in the first thread, the version number of GCC alone (e.g. 5.4.0) doesn't guarantee it has ALL the 11 libraries associated. In fact, it's more complicated than just saying. There are quite a bit incompatibilities between glibc 2.23 and its previous version. And matching glibc versions with GCC versions correctly can be a nightmare util glibc >=2.23 and GCC>=6.

Long things short, it worths to try if you can get definition of NAN by putting include <cmath> on the top. If so, put include <cmath> in includedefine.h of IRFinder source code and re-compile. I'm not sure if it works.

Let me know.

Best, Dadi