williamritchie / IRFinder

Detecting intron retention from RNA-Seq experiments
52 stars 25 forks source link

Problems when building the Reference(T2T-genome) #189

Open MG-DYM opened 2 months ago

MG-DYM commented 2 months ago

Hi Sir,I want to use T2T-genome and its transcripts.gtf to builde the Reference.It's going wrong at <Phase 2: Mapability Calculation>: <Phase 1: STAR Reference Preparation> Jul 04 21:37:17 ..... started STAR run Jul 04 21:37:17 ... starting to generate Genome files Jul 04 21:38:30 ... starting to sort Suffix Array. This may take a long time... Jul 04 21:38:46 ... sorting Suffix Array chunks and saving them to disk... Jul 04 21:51:05 ... loading chunks from disk, packing SA... Jul 04 21:52:30 ... finished generating suffix array Jul 04 21:52:30 ... generating Suffix Array index Jul 04 21:56:23 ... completed Suffix Array index Jul 04 21:56:23 ..... processing annotations GTF Jul 04 21:56:58 ..... inserting junctions into the genome indices Jul 04 21:59:54 ... writing Genome to disk ... Jul 04 22:00:24 ... writing Suffix Array to disk ... Jul 04 22:04:27 ... writing SAindex to disk Jul 04 22:04:41 ..... finished successfully <Phase 2: Mapability Calculation> Jul 04 22:04:41 ... mapping genome fragments back to genome... Jul 04 22:21:57 ... sorting aligned genome fragments... [bam_sort_core] merging from 36 files and 36 in-memory blocks... Jul 04 22:29:45 ... indexing aligned genome fragments... Jul 04 22:30:24 ... filtering aligned genome fragments by chromosome/scaffold... Jul 04 22:31:05 ... merging filtered genome fragments... lzop: : not a lzop file Mapability build: Failed!

I checked the software version and FAST's and GTF's chromosome name,both are ok.Is there something wrong with T2T-geonme's transcripts.gtf?I first used the origin T2T-geonme fasta and its transcripts.gtf,whose chromosome name is like 'NC_060925.1',it also has a problem: 微信图片_20240704232356 Could you help me find out the problem? Thanks for your help,Best wishes!

dg520 commented 1 month ago

Hi @MG-DYM
Sorry for coming late. From the error messages, there are multiple issues.

  1. In the first half where you showed me the failure in Phase 2, this was due to that lzop failed to decompress files. To fix this, I'd recommend you comment out lines 21-24 in the file bin/until/Mapability and re-run. This will ensure the compression algorithm to be gzip instead of lzop.
  2. In the screenshot, the error message indicated that the system couldn't locate a virtual file /dev/fd/62. That virtual file "packages" a standard input (e,g,, a string) into a file-like structure so that a program requiring a file as its input knows how to deal with the situation. This is standard in all Linux systems. While figuring out what exactly happens on your system is a bit hard, the question is whether you run the command in root mode, either starting the command with sudo or logging in as a root user. Please DO NOT run IRFinder in root mode. Otherwise, "/dev/fd/62 not found" will be the destination.
  3. I haven't worked on T2T genome. But you mentioned it has chromosome name is like 'NC_060925.1'. That is totally fine. You don't need to rename it, as long as the chromosome nomenclature is consistent between GTF and FASTA. IRFinder will take care of it.

Finally, may I ask to move this discussion to the new IRFinder repository? The owner of this repository has moved on. I don't have sufficient permission on this site for maintenance so I have to move it to a new home. I have recreated your question in this link and please follow up there. Apologize for the inconvenience.