Closed jmelero611 closed 4 years ago
Hi @jmelero611 ,
Sorry that I missed your poster and just found it.
I think the problem is that: your fasta
file and gtf
file use different naming system for chromosomes. One might use 1,2,3...X,Y
while the other might use chr1,chr2,chr3...chrX,chrY
. If that was the case, you have to use a consistent naming and rebuild IRFinder reference. Let me know.
Best, Dadi
Hi,
I am running IR quantification and the output file has no reads (IRFinder version 1.2.3.).
I built the reference with the last version of the human genome GRCh38.p10, and the GTF version 27.
The code to build the reference is the following (I used links to the genome and to the GTF files, I do not have more files):
IRFinder -m BuildRefProcess -r REF/Human-genome
The message of the Build Reference process is the following:
Launching reference build process. The full build should take at least one hour. Usage : /soft/EB_repo/bio/sequence/programs/noarch/IRFinder/1.2.3/bin/util/IRFinder-BuildRefFromEnsembl mode threads STAR-executable base_ftp_url_of_ensembl_genome+gtf output_directory(must not exist) additional_genome_reference(eg: ERCC) non_polyA_genes-as-bed region_blacklist-as-bed Usage example: /soft/EB_repo/bio/sequence/programs/noarch/IRFinder/1.2.3/bin/util/IRFinder-BuildRefFromEnsembl BuildRef 12 STAR "ftp://ftp.ensembl.org/pub/release-75/fasta/homo_sapiens/dna/" "IRFinder/REF/Human" "Refernce-ERCC.fa.gz" [non_polyA_genes.bed] [blacklist.bed] Nov 30 13:06:20 ..... started STAR run Nov 30 13:06:21 ... starting to generate Genome files Nov 30 13:08:29 ... starting to sort Suffix Array. This may take a long time... Nov 30 13:09:26 ... sorting Suffix Array chunks and saving them to disk... Nov 30 21:26:03 ... loading chunks from disk, packing SA... Nov 30 21:41:41 ... finished generating suffix array Nov 30 21:41:41 ... generating Suffix Array index Nov 30 21:48:52 ... completed Suffix Array index Nov 30 21:48:52 ..... processing annotations GTF Nov 30 21:49:26 ..... inserting junctions into the genome indices Nov 30 22:15:21 ... writing Genome to disk ... Nov 30 22:15:55 ... writing Suffix Array to disk ... Nov 30 22:20:26 ... writing SAindex to disk Nov 30 22:20:47 ..... finished successfully Star genome build result: 0 Commence STAR mapping run for mapability. Fri Nov 30 22:20:52 CET 2018
real 532m37.773s user 805m1.140s sys 186m22.157s Completed STAR run. Sat Dec 1 07:13:30 CET 2018 Commence Coverage calculation.
real 304m26.188s user 313m57.818s sys 287m38.660s
real 0m9.532s user 0m7.483s sys 0m0.577s Completed coverage exclusion calculation. Sat Dec 1 12:18:09 CET 2018 Mapability result: 0 Build Ref 1 Build Ref 2 Build Ref 3 Build Ref 4 ***** WARNING: File /dev/fd/63 has inconsistent naming convention for record: GL000008.2 0 40 X 0 +
***** WARNING: File /dev/fd/63 has inconsistent naming convention for record: GL000008.2 0 40 X 0 +
Build Ref 5 ***** WARNING: File /dev/fd/63 has inconsistent naming convention for record: GL000008.2 0 40 X 0 +
***** WARNING: File /dev/fd/63 has inconsistent naming convention for record: GL000008.2 0 40 X 0 +
Build Ref 6 Build Ref 7 Build Ref 8 Build Ref 9 Build Ref 10 Build Ref 11 Build Ref 12 Build Ref 13c Build Ref 14c Build Ref 16 - COMPLETE Ref build result: 0 ALL DONE
After building the reference, I run IR quantification with the following code:
IRFinder -m BAM -r REF/Human-genome -d output/C5RR0ACXX_3_20_irfinder /path_to_Bam/C5RR0ACXX_3_20.bam
It runs and I recieve the following WARNING file:
WARN: This sample has excessive splice junctions at unannotated locations. This may indicate the experiment is not actually RNA-Seq. Or it indicates the genome fasta and annotation gtf were not compatible.
This is RNA-seq and the genome and the GTF are compatible.
The output is built but with 0 reads. The first rows are these:
chr1 924948 925921 SAMD11/ENSG00000187634.11/clean 0 + 133 0 0 0 0 0 0 0 0 0 0 0 0 0LowCover chr1 925189 925921 SAMD11/ENSG00000187634.11/clean 0 + 83 0 0 0 0 0 0 0 0 0 0 0 0 0LowCover chr1 925800 925921 SAMD11/ENSG00000187634.11/clean 0 + 10 0 0 0 0 0 0 0 0 0 0 0 0 0LowCover
Is there any error in the process (in the building reference or in the IR quantification)?
Thank you very much for you answer.
Best regards, Juan Luis Melero