williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
53 stars 25 forks source link

empty file and IR quantification error #41

Closed sliva1 closed 6 years ago

sliva1 commented 6 years ago

Hi, I have the following message when I try to launch IR quantification 'see below). I construct the reference as shown on the wiki. But in my IRFinder directory I have some empty files ?

-bash-4.2$ wc -l * 0 exclude.directional.bed 411 exclude.omnidirectional.bed 0 intergenic.ROI.bed 294679 introns.unique.bed 0 ref-cover.bed 0 ref-read-continues.ref 117 ref-ROI.bed 294679 ref-sj.ref 589886 total

And last I thought that I had the good version of gcc but the gcc version I have is 4.8.5... So I think that I have to change the version and do you think that could be explain the error ? Best and thanks ! Stef

Message from IR quantification : gzip: stdout: Broken pipe tee: standard output: Broken pipe tee: write error /data/kdi_prod/project_result/1127/03.00/IRFinder/IRFinder-master/bin/IRFinder: line 562: 31822 Done "$STAREXEC" --genomeLoad $STARMEMORYMODE --runThreadN $THREADS --genomeDir "$REF/STAR" --outFilterMultimapNmax 1 --outSAMstrandField intronMotif --outFileNamePrefix "${OUTPUTDIR}/" --outSAMunmapped None --outSAMmode NoQS --outSAMtype BAM Unsorted --outStd BAM_Unsorted --readFilesIn "$FIFO1" "$FIFO2" 31823 Exit 1 | tee "$OUTPUTDIR/Unsorted.bam" 31824 Exit 1 | gzip -cd 31825 Aborted (core dumped) | "$LIBEXEC/irfinder" "$OUTPUTDIR" "$REF/IRFinder/ref-cover.bed" "$REF/IRFinder/ref-sj.ref" "$REF/IRFinder/ref-read-continues.ref" "$REF/IRFinder/ref-ROI.bed" "$OUTPUTDIR/unsorted.frag.bam" >> "$OUTPUTDIR/irfinder.stdout" 2>> "$OUTPUTDIR/irfinder.stderr" ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. ERROR: IRFinder appears not to have completed. It appears an unknown component crashed.

dg520 commented 6 years ago

Hi @sliva1,

IRFinder is built on GCC 4.9.0, but it can work on 4.8.5 if C++ 11 features are supported. The empty files are due to the failure of IRFinder genome preparation, which can be caused by either GCC or other capability problems.

Could you please send me the standard errors during the genome preparation stage, so that I may be able to figure what the real problem is?

Best, Dadi

sliva1 commented 6 years ago

Hi, thanks for the reply ! Here (below) the standard output when I launch IR Quantification, but I think the problem is when I try to construct the reference. I have some empty files in IRFinder directory. DO you want the log files ?

gzip: stdout: Broken pipe tee: standard output: Broken pipe tee: write error /data/kdi_prod/project_result/1127/03.00/IRFinder/IRFinder-master/bin/IRFinder: line 562: 31822 Done "$STAREXEC" --genomeLoad $STARMEMORYMODE --runThreadN $THREADS --genomeDir "$REF/STAR" --outFilterMultimapNmax 1 --outSAMstrandField intronMotif --outFileNamePrefix "${OUTPUTDIR}/" --outSAMunmapped None --outSAMmode NoQS --outSAMtype BAM Unsorted --outStd BAM_Unsorted --readFilesIn "$FIFO1" "$FIFO2" 31823 Exit 1 | tee "$OUTPUTDIR/Unsorted.bam" 31824 Exit 1 | gzip -cd 31825 Aborted (core dumped) | "$LIBEXEC/irfinder" "$OUTPUTDIR" "$REF/IRFinder/ref-cover.bed" "$REF/IRFinder/ref-sj.ref" "$REF/IRFinder/ref-read-continues.ref" "$REF/IRFinder/ref-ROI.bed" "$OUTPUTDIR/unsorted.frag.bam" >> "$OUTPUTDIR/irfinder.stdout" 2>> "$OUTPUTDIR/irfinder.stderr" ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. ERROR: IRFinder appears not to have completed. It appears an unknown component crashed. ERROR: IRFinder appears not to have completed. It appears an unknown component crashed.


De : Dadi notifications@github.com Envoyé : mardi 5 juin 2018 16:13:09 À : williamritchie/IRFinder Cc : Liva Stephane; Mention Objet : Re: [williamritchie/IRFinder] empty file and IR quantification error (#41)

Hi @sliva1https://github.com/sliva1,

IRFinder is built on GCC 4.9.0, but it can work on 4.8.5 if C++ 11 features are supported. The empty files are due to the failure of IRFinder genome preparation, which can be caused by either GCC or other capability problems.

Could you please send me the standard errors during the genome preparation stage, so that I may be able to figure what the real problem is?

Best, Dadi

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/williamritchie/IRFinder/issues/41#issuecomment-394724706, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AmGigTPu-aEkxIaDdFDDrn6kmimM7SXQks5t5pH1gaJpZM4UaqWe.

dg520 commented 6 years ago

Hi @sliva1,

The quantification failed because of the empty reference file. We have to check what happened during the reference preparation stage to figure out the solution. Unfortunately, the current IRFinder doesn't save a log file during reference preparation. You have to re-run reference build mode and send me all the screen information. Thank you!

Best, Dadi

sliva1 commented 6 years ago

Hi Dadi,

Ok thanks, I re-run the reference, I will have the information tomorrow (I'm in Paris and it is 5pm !!).

I send you all the information when the run finish !

Thanks again and have a good day!

Best

Stef


De : Dadi notifications@github.com Envoyé : mardi 5 juin 2018 16:44:39 À : williamritchie/IRFinder Cc : Liva Stephane; Mention Objet : Re: [williamritchie/IRFinder] empty file and IR quantification error (#41)

Hi @sliva1https://github.com/sliva1,

The quantification failed because of the empty reference file. We have to check what happened during the reference preparation stage to figure out the solution. Unfortunately, the current IRFinder doesn't save a log file during reference preparation. You have to re-run reference build mode and send me all the screen information. Thank you!

Best, Dadi

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/williamritchie/IRFinder/issues/41#issuecomment-394736685, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AmGigTrjudxnvvXbxIhp3krE1YOIuTWYks5t5plXgaJpZM4UaqWe.

sliva1 commented 6 years ago

Hi Dadi, Ok all works ! I integrate IRFinder in our home pipeline! I have a question regarding the detection of adaptator. You can detect the adaptator only in a paired-end experiment (and not in a single-end)? Cause I see that you can carry out the adaptator in a single-end experiment when you launch IR quantifcation but how do you know which adaptator you have to remove ? I don't know if I'm clear !!

dg520 commented 6 years ago

Hi @sliva1 ,

For pair-end, IRFinder can automatically determine the most likely adapter sequence to be trimmed, taking advantage of read pairs that overlap each other. For single-end, this approach cannot be applied and user has to feed IRFinder with the correct adaptor manually (-a option). Otherwise, IRFinder will trim Illumina universal adaptors.

Best, Dadi

sliva1 commented 6 years ago

Hi Dadi, Thanks for the answer ! I have questions:

sliva1 commented 6 years ago

Here the output of creating mm9 reference:

bin/IRFinder -m BuildRef -r REF/Mouse-mm9-release67 -e REF/extra-input-files/RNA.SpikeIn.ERCC.fasta.gz -R REF/extra-input-files/Mouse_mm9_nonPolyA_ROI.bed ftp://ftp.ensembl.org/pub/release-67/gtf/mus_musculus/Mus_musculus.NCBIM37.67.gtf.gz Launching reference build process. The full build should take at least one hour. Usage : /data/kdi_prod/.kdi/project_workspace_0/1127/acl/03.00/IRFinder/IRFinder-master/bin/util/IRFinder-BuildRefFromEnsembl mode threads STAR-executable base_ftp_url_of_ensembl_genome+gtf output_directory(must not exist) additional_genome_reference(eg: ERCC) non_polyA_genes-as-bed region_blacklist-as-bed Usage example: /data/kdi_prod/.kdi/project_workspace_0/1127/acl/03.00/IRFinder/IRFinder-master/bin/util/IRFinder-BuildRefFromEnsembl BuildRef 12 STAR "ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/" "IRFinder/REF/Human" "Refernce-ERCC.fa.gz" [non_polyA_genes.bed] [blacklist.bed] Trying to fetch dna.primary_assembly and GTF based on: ftp://ftp.ensembl.org/pub/release-67/gtf/mus_musculus/Mus_musculus.NCBIM37.67.gtf.gz

--2018-06-28 11:07:41-- ftp://ftp.ensembl.org/pub/release-67/fasta/mus_musculus/dna/*.dna.primary_assembly.fa.gz => '.listing' Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.193.8 Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.8|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/release-67/fasta/mus_musculus/dna ... done. ==> PASV ... done. ==> LIST ... done.

[ <=>                                      ] 5,334       --.-K/s   in 0.09s   

2018-06-28 11:07:41 (57.4 KB/s) - '.listing' saved [5334]

Removed '.listing'. No matches on pattern '.dna.primary_assembly.fa.gz'. --2018-06-28 11:07:41-- ftp://ftp.ensembl.org/pub/release-67/fasta/mus_musculus/dna/.dna.toplevel.fa.gz => '.listing' Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.193.8 Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.8|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/release-67/fasta/mus_musculus/dna ... done. ==> PASV ... done. ==> LIST ... done.

[ <=>                                      ] 5,334       --.-K/s   in 0s      

2018-06-28 11:07:42 (271 MB/s) - '.listing' saved [5334]

Removed '.listing'. --2018-06-28 11:07:42-- ftp://ftp.ensembl.org/pub/release-67/fasta/mus_musculus/dna/Mus_musculus.NCBIM37.67.dna.toplevel.fa.gz => 'Mus_musculus.NCBIM37.67.dna.toplevel.fa.gz' ==> CWD not required. ==> PASV ... done. ==> RETR Mus_musculus.NCBIM37.67.dna.toplevel.fa.gz ... done. Length: 764264371 (729M)

100%[=========================================>] 764,264,371 21.2MB/s in 30s

2018-06-28 11:08:14 (24.3 MB/s) - 'Mus_musculus.NCBIM37.67.dna.toplevel.fa.gz' saved [764264371]

--2018-06-28 11:08:14-- ftp://ftp.ensembl.org/pub/release-67/gtf/mus_musculus/Mus_musculus.NCBIM37.67.gtf.gz => 'Mus_musculus.NCBIM37.67.gtf.gz' Resolving ftp.ensembl.org (ftp.ensembl.org)... 193.62.193.8 Connecting to ftp.ensembl.org (ftp.ensembl.org)|193.62.193.8|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/release-67/gtf/mus_musculus ... done. ==> SIZE Mus_musculus.NCBIM37.67.gtf.gz ... 12773886 ==> PASV ... done. ==> RETR Mus_musculus.NCBIM37.67.gtf.gz ... done. Length: 12773886 (12M) (unauthoritative)

100%[=========================================>] 12,773,886 12.6MB/s in 1.0s

2018-06-28 11:08:16 (12.6 MB/s) - 'Mus_musculus.NCBIM37.67.gtf.gz' saved [12773886]

Jun 28 11:08:54 ..... started STAR run Jun 28 11:08:54 ... starting to generate Genome files Jun 28 11:08:54 ... starting to sort Suffix Array. This may take a long time... Jun 28 11:08:54 ... sorting Suffix Array chunks and saving them to disk... Jun 28 11:08:54 ... loading chunks from disk, packing SA... Jun 28 11:08:54 ... finished generating suffix array Jun 28 11:08:54 ... generating Suffix Array index Jun 28 11:08:57 ... completed Suffix Array index Jun 28 11:08:57 ..... processing annotations GTF

Fatal INPUT FILE error, no exon lines in the GTF file: /data/kdi_prod/.kdi/project_workspace_0/1127/acl/03.00/IRFinder/IRFinder-master/REF/Mouse-mm9-release67/transcripts.gtf Solution: check the formatting of the GTF file, it must contain some lines with exon in the 3rd column. Make sure the GTF file is unzipped. If exons are marked with a different word, use --sjdbGTFfeatureExon .

Jun 28 11:08:57 ...... FATAL ERROR, exiting Star genome build result: 26624 Commence STAR mapping run for mapability. Thu Jun 28 11:08:57 CEST 2018

EXITING because of FATAL ERROR: could not open genome file /data/kdi_prod/.kdi/project_workspace_0/1127/acl/03.00/IRFinder/IRFinder-master/REF/Mouse-mm9-release67/STAR/genomeParameters.txt SOLUTION: check that the path to genome files, specified in --genomeDir is correct and the files are present, and have user read permsissions

Jun 28 11:08:57 ...... FATAL ERROR, exiting

real 0m0.026s user 0m0.002s sys 0m0.003s Completed STAR run. Thu Jun 28 11:08:57 CEST 2018 Commence Coverage calculation. ls: cannot access tmp_by_chr_11865/*.bed.gz: No such file or directory

real 0m0.007s user 0m0.001s sys 0m0.002s cat: tmp_by_chr_11865/*.exclusion: No such file or directory

real 0m0.006s user 0m0.001s sys 0m0.003s rm: cannot remove 'tmp_by_chr_11865/bed.gz.exclusion': No such file or directory rm: cannot remove 'tmp_by_chr_11865/bed.gz': No such file or directory Completed coverage exclusion calculation. Thu Jun 28 11:08:57 CEST 2018 Mapability result: 0 Build Ref 1 Build Ref 2 Build Ref 3 Build Ref 4 Build Ref 5 Build Ref 6 Build Ref 7 Build Ref 8 Build Ref 9 Build Ref 10 Build Ref 11 Build Ref 12 Build Ref 13b Build Ref 14b Build Ref 15b Build Ref 16 - COMPLETE Ref build result: 0 ALL DONE

sliva1 commented 6 years ago

Hi Dadi, I have a question regarding the trim function. In the documentation, you write that we have to unzip fastq when running quantification to trim adaptator. I saw in the IRFinder main script that you unzip fastq to trim adaptator, so do we have to unzip file first before running quantification to remove adaptator ? best Stef

dg520 commented 6 years ago

Hi Stef,

To run the script trim manually/alone, you would have to feed it with unzipped FASTQs. If you're talking about trim called during IRFinder, it will take care of the unzip process if your input are gzipped FASTQs. Apologize for a late reply.

Best, Dadi