loosolab / UROPA

Universal RObust Peak Annotator
https://uropa-manual.readthedocs.io/
MIT License
15 stars 6 forks source link

UROPA run error: no summary output #22

Open deep-buddingcoder opened 1 year ago

deep-buddingcoder commented 1 year ago

Hi, I have encountered UROPA run error, where "Visualized summary output could not be created ...".

This is what I have done: As an input, I have used ATAC-seq peak file generated by macs2 with reference to UCSC hg38 genome file (without chrM as well as chromosome contigs like GL000009.2, KI270442.1)

I wanted to run UROPA using Ensembl hg38 GTF file, reference weblink: https://ftp.ensembl.org/pub/release-109/gtf/homo_sapiens/Homo_sapiens.GRCh38.109.gtf.gz

After downloading, I have modified the GTF file as follows,

  1. In col1, the string "chr" has been added as prefix to chromosome number like 1-22, X, Y & MT and the contigs like GL000009.2 has been removed using AWK.
  2. Then I have sorted the GTF file using the sort command which has been used in UROPA as well.

My config file is: `{ "queries":[

    {"feature":["five_prime_utr", "three_prime_utr", "exon", "gene", "transcript", "CDS"],  "feature.anchor":["start", "center", "end"],    "distance":[10000, 10000],      "strand":"ignore",      "direction":"", "internals":"0.01",     "show.attributes":["gene_id", "gene_name", "gene_biotype", "transcript_id", "transcript_name"]}

    ],

"priority":"False", "gtf":"Homo_sapiens.GRCh38.109_formatModified_sorted.gtf", "bed":"16D_Ctrl_E10d_AccDNA40-100_catenated_peaks_sorted_merged.bed" }`

My uropa command: $ uropa --bed 16D_Ctrl_E10d_AccDNA40-100_catenated_peaks_sorted_merged.bed --gtf Homo_sapiens.GRCh38.109_sorted_formatModified.gtf --input uropa_config_EnsemblGTF_deep_v1_1.json --prefix 16D_Ctrl_E10d_AccDNA40-100_peaks_merged --outdir 16D_Ctrl_E10d_AccDNA40-100_catenated_peaks_sorted_merged --output-by-query --summary --threads 33 --log uropa.log --debug

This is the section describing the error pertaining to no creation of visualized summary (graphs): `2023-05-16 10:15:27 (91656) [INFO] Processing annotated peaks 2023-05-16 10:15:27 (91656) [INFO] Creating the Summary graphs of the results... 2023-05-16 10:15:27 (91656) [DEBUG] Summary output call is uropa_summary.R -f 16D_Ctrl_E10d_AccDNA40-100_catenated_peaks_sorted_merged/16D_Ctrl_E10d_AccDNA40-100_peaks_merged_finalhits.txt -c 16D_Ctrl_E10d_AccDNA40-100_catenated_peaks_sorted_merged/16D_Ctrl_E10d_AccDNA40-100_peaks_merged.json -o 16D_Ctrl_E10d_AccDNA40-100_catenated_peaks_sorted_merged/16D_Ctrl_E10d_AccDNA40-100_peaks_merged_summary.pdf -b 16D_Ctrl_E10d_AccDNA40-100_catenated_peaks_sorted_merged/16D_Ctrl_E10d_AccDNA40-100_peaks_merged_allhits.txt -a ' /home/deep/miniconda3/envs/uropa/bin/uropa --bed 16D_Ctrl_E10d_AccDNA40-100_catenated_peaks_sorted_merged.bed --gtf Homo_sapiens.GRCh38.109_formatModified_sorted.gtf --input uropa_config_EnsemblGTF_deep_v1_1.json --prefix 16D_Ctrl_E10d_AccDNA40-100_peaks_merged --outdir 16D_Ctrl_E10d_AccDNA40-100_catenated_peaks_sorted_merged --output-by-query --summary --threads 33 --log uropa.log --debug '

Warning message: package ‘ggplot2’ was built under R version 4.2.3 Error in .basic.summary(opt$finalhits, opt$config, opt$output) : No valid peak annotations with specified query/queries, summary unfeasible! Execution halted 2023-05-16 10:15:33 (91656) [WARNING] Visualized summary output could not be created from: uropa_summary.R -f 16D_Ctrl_E10d_AccDNA40-100_catenated_peaks_sorted_merged/16D_Ctrl_E10d_AccDNA40-100_peaks_merged_finalhits.txt -c 16D_Ctrl_E10d_AccDNA40-100_catenated_peaks_sorted_merged/16D_Ctrl_E10d_AccDNA40-100_peaks_merged.json -o 16D_Ctrl_E10d_AccDNA40-100_catenated_peaks_sorted_merged/16D_Ctrl_E10d_AccDNA40-100_peaks_merged_summary.pdf -b 16D_Ctrl_E10d_AccDNA40-100_catenated_peaks_sorted_merged/16D_Ctrl_E10d_AccDNA40-100_peaks_merged_allhits.txt -a ' /home/deep/miniconda3/envs/uropa/bin/uropa --bed 16D_Ctrl_E10d_AccDNA40-100_catenated_peaks_sorted_merged.bed --gtf Homo_sapiens.GRCh38.109_formatModified_sorted.gtf --input uropa_config_EnsemblGTF_deep_v1_1.json --prefix 16D_Ctrl_E10d_AccDNA40-100_peaks_merged --outdir 16D_Ctrl_E10d_AccDNA40-100_catenated_peaks_sorted_merged --output-by-query --summary --threads 33 --log uropa.log --debug ' 2023-05-16 10:15:33 (91656) [INFO] UROPA run finished in 0:01:53! 2023-05-16 10:15:33 (91656) [DEBUG] Waiting for listener to finish`

Sorry for the long description. I will appreciate your help in deciphering the way to sort this error to get summary visualization.

Deep

deep-buddingcoder commented 1 year ago

Hi, an update: Original Ensembl GTF file also resulted in the same error. After executing some control UROPA runs, I have come to the conclusion that problem lies in the format of the GTF file. Not sure, how to fix it. Any suggestion is welcome. Thanks Deep

deep-buddingcoder commented 1 year ago

Hi, another update: I have run UROPA with hg38 ncbi refseq original GTF file and UROPA generated the summary. UROPA could not generate the summary when I have used original hg38 Ensembl GTF or GENCODE GTF. Any idea how to fix this issue. I need to specifically run UROPA with Ensembl GTF. Thanks Deep

msbentsen commented 1 year ago

Hi Deep,

Thank you for your issue report - I found a bug in the script creating the summary plots, so it had nothing to do with the GTF format. I made a fix on the uropa dev branch, but we like to collect a few fixes before issuing a final version. In the meantime, please install the new uropa version using: pip install git+https://github.com/loosolab/uropa@dev

The version should be shown as:

$ uropa --version
uropa 4.0.3-beta

Hope this solves the issue!