loosolab / UROPA

Universal RObust Peak Annotator
https://uropa-manual.readthedocs.io/
MIT License
15 stars 6 forks source link

Indexing failed for sorted gtf file #23

Open yiling-li-0 opened 1 year ago

yiling-li-0 commented 1 year ago

Hi,

I was running the code: uropa -b masterPeak_ir.bed -g gencode.vM33.basic.annotation.gtf --internals 1.0 -t 1 --show-attributes gene_id

and received the following error: 2023-08-07 15:27:25 [WARNING] - Indexing failed - the GTF is probably unsorted 2023-08-07 15:27:25 [WARNING] - Attempting to sort with call: grep -v "^#" gencode.vM33.basic.annotation.gtf | sort -k1,1 -k4,4n > ./masterPeak_ir_sorted.gtf 2023-08-07 15:28:38 [ERROR] - Could not index .gtf-file - please check whether the file has the correct 9-column format.

The GTF file was downloaded from https://www.gencodegenes.org/mouse/, in the basic annotation section, and the file should be sorted.

msbentsen commented 1 year ago

Hi @yiling-li-0 ,

I tried to replicate the error but I was able to download and sort the file as seen here:

2023-08-15 11:31:19 (5662) [INFO]       Started UROPA 4.0.2
(...)
2023-08-15 11:31:19 (5662) [INFO]       Preparing .gtf-file for fast access
[E::hts_idx_push] Unsorted positions on sequence #1: 3741569 followed by 3491925
2023-08-15 11:31:43 (5662) [WARNING]    Indexing failed - the GTF is probably unsorted
2023-08-15 11:31:43 (5662) [WARNING]    Attempting to sort with call: grep -v "^#" gencode.vM33.basic.annotation.gtf | sort -k1,1 -k4,4n > ./masterPeak_ir_sorted.gtf
2023-08-15 11:32:03 (5662) [INFO]       Sorting and indexing was successful
2023-08-15 11:32:03 (5662) [INFO]       Started annotation

Can you give me a little more information about the operating system you are on, the UROPA version you are using etc.?

In order for the command to work, you also need to have grep and sort available on the commandline - can you check that with grep --help and sort --help?