williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
53 stars 25 forks source link

No reads mapped error #49

Closed jmelero611 closed 4 years ago

jmelero611 commented 5 years ago

Hi,

I am running IR quantification and the output file has no reads (IRFinder version 1.2.3.).

I built the reference with the last version of the human genome GRCh38.p10, and the GTF version 27.

The code to build the reference is the following (I used links to the genome and to the GTF files, I do not have more files):

IRFinder -m BuildRefProcess -r REF/Human-genome

The message of the Build Reference process is the following:

Launching reference build process. The full build should take at least one hour. Usage : /soft/EB_repo/bio/sequence/programs/noarch/IRFinder/1.2.3/bin/util/IRFinder-BuildRefFromEnsembl mode threads STAR-executable base_ftp_url_of_ensembl_genome+gtf output_directory(must not exist) additional_genome_reference(eg: ERCC) non_polyA_genes-as-bed region_blacklist-as-bed Usage example: /soft/EB_repo/bio/sequence/programs/noarch/IRFinder/1.2.3/bin/util/IRFinder-BuildRefFromEnsembl BuildRef 12 STAR "ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/" "IRFinder/REF/Human" "Refernce-ERCC.fa.gz" [non_polyA_genes.bed] [blacklist.bed] Nov 30 13:06:20 ..... started STAR run Nov 30 13:06:21 ... starting to generate Genome files Nov 30 13:08:29 ... starting to sort Suffix Array. This may take a long time... Nov 30 13:09:26 ... sorting Suffix Array chunks and saving them to disk... Nov 30 21:26:03 ... loading chunks from disk, packing SA... Nov 30 21:41:41 ... finished generating suffix array Nov 30 21:41:41 ... generating Suffix Array index Nov 30 21:48:52 ... completed Suffix Array index Nov 30 21:48:52 ..... processing annotations GTF Nov 30 21:49:26 ..... inserting junctions into the genome indices Nov 30 22:15:21 ... writing Genome to disk ... Nov 30 22:15:55 ... writing Suffix Array to disk ... Nov 30 22:20:26 ... writing SAindex to disk Nov 30 22:20:47 ..... finished successfully Star genome build result: 0 Commence STAR mapping run for mapability. Fri Nov 30 22:20:52 CET 2018

real 532m37.773s user 805m1.140s sys 186m22.157s Completed STAR run. Sat Dec 1 07:13:30 CET 2018 Commence Coverage calculation.

real 304m26.188s user 313m57.818s sys 287m38.660s

real 0m9.532s user 0m7.483s sys 0m0.577s Completed coverage exclusion calculation. Sat Dec 1 12:18:09 CET 2018 Mapability result: 0 Build Ref 1 Build Ref 2 Build Ref 3 Build Ref 4 ***** WARNING: File /dev/fd/63 has inconsistent naming convention for record: GL000008.2 0 40 X 0 +

***** WARNING: File /dev/fd/63 has inconsistent naming convention for record: GL000008.2 0 40 X 0 +

Build Ref 5 ***** WARNING: File /dev/fd/63 has inconsistent naming convention for record: GL000008.2 0 40 X 0 +

***** WARNING: File /dev/fd/63 has inconsistent naming convention for record: GL000008.2 0 40 X 0 +

Build Ref 6 Build Ref 7 Build Ref 8 Build Ref 9 Build Ref 10 Build Ref 11 Build Ref 12 Build Ref 13c Build Ref 14c Build Ref 16 - COMPLETE Ref build result: 0 ALL DONE

After building the reference, I run IR quantification with the following code:

IRFinder -m BAM -r REF/Human-genome -d output/C5RR0ACXX_3_20_irfinder /path_to_Bam/C5RR0ACXX_3_20.bam

It runs and I recieve the following WARNING file:

WARN: This sample has excessive splice junctions at unannotated locations. This may indicate the experiment is not actually RNA-Seq. Or it indicates the genome fasta and annotation gtf were not compatible.

This is RNA-seq and the genome and the GTF are compatible.

The output is built but with 0 reads. The first rows are these:

chr1 924948 925921 SAMD11/ENSG00000187634.11/clean 0 + 133 0 0 0 0 0 0 0 0 0 0 0 0 0LowCover chr1 925189 925921 SAMD11/ENSG00000187634.11/clean 0 + 83 0 0 0 0 0 0 0 0 0 0 0 0 0LowCover chr1 925800 925921 SAMD11/ENSG00000187634.11/clean 0 + 10 0 0 0 0 0 0 0 0 0 0 0 0 0LowCover

Is there any error in the process (in the building reference or in the IR quantification)?

Thank you very much for you answer.

Best regards, Juan Luis Melero

dg520 commented 5 years ago

Hi @jmelero611 ,

Sorry that I missed your poster and just found it. I think the problem is that: your fasta file and gtf file use different naming system for chromosomes. One might use 1,2,3...X,Y while the other might use chr1,chr2,chr3...chrX,chrY. If that was the case, you have to use a consistent naming and rebuild IRFinder reference. Let me know.

Best, Dadi