williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
53 stars 25 forks source link

ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. #70

Closed EmmaMcHugh closed 4 years ago

EmmaMcHugh commented 4 years ago

Hi Dadi,

I'm having a problem with IRFinder. I'm interested in using it for detecting intron retention in the malaria parasite Plasmodium falciparum. I have completed the reference building step - the IRFinder directory looks like this:

-rw-r--r-- 1 mchughe punim0875  555367 Jan  3 15:26 exclude.directional.bed
-rw-r--r-- 1 mchughe punim0875   99886 Jan  3 15:26 exclude.omnidirectional.bed
-rw-r--r-- 1 mchughe punim0875    3811 Jan  3 15:26 intergenic.ROI.bed
-rw-r--r-- 1 mchughe punim0875  327433 Jan  3 15:26 introns.unique.bed
-rw-r--r-- 1 mchughe punim0875 2413788 Jan  3 15:26 ref-cover.bed
-rw-r--r-- 1 mchughe punim0875  196683 Jan  3 15:26 ref-read-continues.ref
-rw-r--r-- 1 mchughe punim0875    3811 Jan  3 15:26 ref-ROI.bed
-rw-r--r-- 1 mchughe punim0875  165020 Jan  3 15:26 ref-sj.ref

And the output looks like this:

Launching reference build process. The full build should take at least one hour.
Usage : /usr/local/easybuild/software/IRFinder/1.2.5-intel-2017.u2/bin/util/IRFinder-BuildRefFromEnsembl mode threads STAR-executable base_ftp_url_of_ensembl_genome+gtf output_directory(must not exist) additional_genome_reference(eg: ERCC) non_polyA_genes-as-bed region_blacklist-as-bed
Usage example: /usr/local/easybuild/software/IRFinder/1.2.5-intel-2017.u2/bin/util/IRFinder-BuildRefFromEnsembl BuildRef 12 STAR "ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/" "IRFinder/REF/Human" "Refernce-ERCC.fa.gz" [non_polyA_genes.bed] [blacklist.bed]
Jan 03 15:21:37 ..... started STAR run
Jan 03 15:21:37 ... starting to generate Genome files
Jan 03 15:21:38 ... starting to sort Suffix Array. This may take a long time...
Jan 03 15:21:38 ... sorting Suffix Array chunks and saving them to disk...
Jan 03 15:22:11 ... loading chunks from disk, packing SA...
Jan 03 15:22:12 ... finished generating suffix array
Jan 03 15:22:12 ... generating Suffix Array index
Jan 03 15:22:25 ... completed Suffix Array index
Jan 03 15:22:25 ..... processing annotations GTF
Jan 03 15:22:25 ..... inserting junctions into the genome indices
Jan 03 15:22:52 ... writing Genome to disk ...
Jan 03 15:22:54 ... writing Suffix Array to disk ...
Jan 03 15:22:55 ... writing SAindex to disk
Jan 03 15:23:02 ..... finished successfully
Star genome build result: 0
Commence STAR mapping run for mapability.
Fri Jan  3 15:23:02 AEDT 2020
Completed STAR run.
Fri Jan  3 15:26:11 AEDT 2020
Commence Coverage calculation.
Completed coverage exclusion calculation.
Fri Jan  3 15:26:16 AEDT 2020
Mapability result: 0
Build Ref 1
Build Ref 2
Build Ref 3
Build Ref 4
Build Ref 5
Build Ref 6
Build Ref 7
Build Ref 8
Build Ref 9
Build Ref 10
Build Ref 11
Build Ref 12
Build Ref 13c
Build Ref 14c
Build Ref 16 - COMPLETE
Ref build result: 0
ALL DONE

The directories created look as follows:

drwxr-sr-x 2 mchughe punim0875    3765799 Jan  3 15:26 IRFinder/
drwxr-sr-x 2 mchughe punim0875       7477 Jan  3 15:23 logSTARbuild/
drwxr-sr-x 2 mchughe punim0875      47477 Jan  3 15:26 Mapability/
drwxr-sr-x 2 mchughe punim0875 1810164708 Jan  3 15:22 STAR/

After this I tried to run IRFinder in BAM mode:

IRFinder -m BAM -r /data/cephfs/punim0875/emma/NMD/genome /data/cephfs/punim0875/emma/NMD/STAR_output_test/0022_TL1906884_UPF1-1_MAN-20190724_TSStrmRNA_L000Aligned.toTranscriptome.out.bam This results in the following error:

/usr/local/easybuild/software/IRFinder/1.2.5-intel-2017.u2/bin/IRFinder: line 562: 29613 Broken pipe             gzip -cd "$1"
     29614 Segmentation fault      | "$LIBEXEC/irfinder" "$OUTPUTDIR" "$REF/IRFinder/ref-cover.bed" "$REF/IRFinder/ref-sj.ref" "$REF/IRFinder/ref-read-continues.ref" "$REF/IRFinder/ref-ROI.bed" "$OUTPUTDIR/unsorted.frag.bam" >> "$OUTPUTDIR/irfinder.stdout" 2>> "$OUTPUTDIR/irfinder.stderr"
ERROR: IRFinder appears not to have completed. It appears an unknown component crashed.
ERROR: IRFinder appears not to have completed. It appears an unknown component crashed.
ERROR: IRFinder appears not to have completed. It appears an unknown component crashed.

If you have any ideas on how I can fix this I would greatly appreciate it! Thanks for making IRFinder :)

dg520 commented 4 years ago

Hi @EmmaMcHugh ,

This looks like a C error. Which version of GCC (>=4.90 required) you're using and which version of GLIBC (>=2.14 required)? You can check the GCC by typing in which gcc while GLIBC version is a bit complicated to check. If it's possible, please let me know which Linux platform you're working on (distro and version). Please note, our support on Mac OS is limited. Although Mac uses a Linux kernel, the compiler is not standardized as Linux.

Anyway, you can try the following two steps: Step 1: You can cd into src/irfinder in the IRFinder folder and recompile IRFinder by typing in make clean; make. If you encountered any problem, let me know the error message and it's an indication of GCC/GLIBC incompatibility. If you can successfully compile without error, it should generate irfinder binary file in the folder. You need to type in cp irfinder ../../bin/util/ and then go to Step 2.

Step 2: cd into ../../bin/util/. Then make a folder for testing purpose by typing in mkdir test. Make sure your BAM file is sorted by names instead of coordinates. And try to run the following:

gzip -cd /data/cephfs/punim0875/emma/NMD/STAR_output_test/0022_TL1906884_UPF1-1_MAN-20190724_TSStrmRNA_L000Aligned.toTranscriptome.out.bam | irfinder test /data/cephfs/punim0875/emma/NMD/genome/IRFinder/ref-cover.bed /data/cephfs/punim0875/emma/NMD/genome/IRFinder/ref-sj.ref /data/cephfs/punim0875/emma/NMD/genome/IRFinder/ref-read-continues.ref /data/cephfs/punim0875/emma/NMD/genome/IRFinder/ref-ROI.bed test/unsorted.frag.bam >> test/irfinder.stdout 2>> test/irfinder.stderr

This command is supposed to run the quantification directly using the binary core instead of the IRFinder wrapper, which will expose the true error message instead of a vague segamentation fault. And it will save the output under the folder test, which you can remove later. The execution might take a while. If you encountered any problem, please send me the ENTIRE error message and it's an indication of GLIBC incompatibility.

Best, Dadi

EmmaMcHugh commented 4 years ago

Hi Dadi,

Thanks for your advice. FYI I asked the people who run the HPC at my University and they said they compiled IRFinder with Intel 2017.u2 compiler which is backed by GCC 6.2.0, with GLIBC 2.17. So this all seems fine.

However, I tried running IRFinder on a coordinate sorted BAM (STAR aligner output)- Aligned.sortedByCoord.out.bam instead of unsorted (Aligned.toTranscriptome.out.bam which I had tried previously)

This seems to have worked and gave me the following output:

-rw-r--r-- 1 mchughe punim0875       373 Jan 13 11:57 IRFinder-ChrCoverage.txt
-rw-r--r-- 1 mchughe punim0875    804151 Jan 13 11:57 IRFinder-IR-dir.txt
-rw-r--r-- 1 mchughe punim0875    826087 Jan 13 11:57 IRFinder-IR-nondir.txt
-rw-r--r-- 1 mchughe punim0875    386125 Jan 13 11:57 IRFinder-JuncCount.txt
-rw-r--r-- 1 mchughe punim0875       401 Jan 13 11:57 IRFinder-ROI.txt
-rw-r--r-- 1 mchughe punim0875    267250 Jan 13 11:57 IRFinder-SpansPoint.txt
-rw-r--r-- 1 mchughe punim0875         0 Jan 13 11:44 irfinder.stderr
-rw-r--r-- 1 mchughe punim0875      1393 Jan 13 11:57 irfinder.stdout
-rw-r--r-- 1 mchughe punim0875 589603149 Jan 13 11:57 unsorted.frag.bam
-rw-r--r-- 1 mchughe punim0875         0 Jan 13 11:57 WARNINGS

Thank you very much for your help!

dg520 commented 4 years ago

Hi @EmmaMcHugh ,

Using sorted BAM might lead to incorrect results for paired-end RNASeq libraries. IRFinder needs unsorted BAM to figure out the read pairs correctly.

Considering you believe the GCC and GLIBC are compatible, could you please run the command in Step 2 to see if it works on your unsorted BAM? If it did work, you might want to compare its result with the result you generated using sorted BAM.

Best, Dadi