Closed MWSprivate closed 7 months ago
Hi!
This occurs, because the features in your GTF regions file are overlapping - when you run iCount segment, on the annotation GTF, this produced the segmenation file that you have provided in your PEKA run, but it also produces another file called "regions.gtf.gz". This is the file that should be used for PEKA.
In regions.gtf the overlapping features such that a genomic interval is only assigned to one of these features: ncRNA, intergenic, CDS, UTR3, UTR5, intron. Note that there are no gene and transcript features anymore.
You can see how the file looks like in test data inputs: https://github.com/ulelab/peka/blob/main/TestData/inputs/sorted.regions.gtf.gz
Please let me know if this resolves your issue.
Cheers!
Hi
Thanks for the help. I had a quick look, might be that my iCount version (2.0.0) doesn't match. It only produces one file (help message below). I'll check another version once I have some spare time.
Best regards
usage: iCount segment [-h] [-prog] [-S] [-F] [-P] [-M] annotation segmentation fai
Parse annotation file into internal iCount structure - segmentation.
Currently, only annotations from ENSEMBl and GENCODE are supported. http://www.gencodegenes.org/ http://www.ensembl.org
Segmentation is used in almost all further analyses.
In segmentation, each transcript is partitioned into so called regions/intervals. Such regions must span the whole transcript, but should not intersect with each other. However, higher hierarchy levels: transcripts and genes can of course intersect each other.
Example of possible segmentation::
Genome level: |---------------------------------------------------|
Gene level: |--------------gene1--------------| |-intergenic-|
|---------gene2--------|
Transcript l.: |----------transcript1---------|
|-------transcript2-------|
|------transcript3-----|
Region level: |-CDS-||-intron-||-CDS-||-UTR3-|
For simplicity, only the partition of transcript1 is presented.
positional arguments: annotation Path to input GTF file segmentation Path to output GTF file fai Path to input genome_file (.fai or similar)
options: -h, --help show this help message and exit -prog, --report_progress Switch to show progress (default: False) -S , --stdout_log Threshold value (0-50) for logging to stdout. If 0, logging to stdout if turned OFF. -F , --file_log Threshold value (0-50) for logging to file. If 0, logging to file if turned OFF. -P , --file_logpath Path to log file. -M , --results_file File into which to store Metrics.
Indeed the version discrepancy is likely the reason. I am running iCount-Mini v 3.0.1. It can be installed with pip or with conda. This should fix the issue :)
Hi there
Thanks for the program. I installed it and running the test data works fine. Moving to my own data (the first is what I understood from the documentation but all fail):
peka -i iCount.deDupBCQFdemux_barcode_RT6.peaks.forPeka.bed -x iCount.deDupBCQFdemux_barcode_RT6.cDNA_unique.forPeka.bed -g refGenome.fasta -gi refGenome.fasta.fai -r refGenome.segs.forPeka.gtf peka -i iCount.deDupBCQFdemux_barcode_RT6.clusters.forPeka.bed -x iCount.deDupBCQFdemux_barcode_RT6.cDNA_unique.forPeka.bed -g refGenome.fasta -gi refGenome.fasta.fai -r refGenome.segs.forPeka.gtf peka -i iCount.deDupBCQFdemux_barcode_RT6.clusters.forPeka.bed -x iCount.deDupBCQFdemux_barcode_RT6.peaks.forPeka.bed -g refGenome.fasta -gi refGenome.fasta.fai -r refGenome.segs.forPeka.gtf
gives me an error:
Getting thresholded crosslinks Thresholding intron lenght of df_reg for intron is: 1038414 Traceback (most recent call last): File "/home/name/miniconda3/envs/peka/bin/peka", line 8, in
sys.exit(main())
File "/home/name/miniconda3/envs/peka/bin/peka.py", line 1462, in main
set_seeds
File "/home/name/miniconda3/envs/peka/bin/peka.py", line 1051, in run
df_txn = get_threshold_sites(sites_file, percentile=percentile)
File "/home/name/miniconda3/envs/peka/bin/peka.py", line 497, in get_threshold_sites
df_cut = cut_sites_with_region(df_reg, df_region)
File "/home/name/miniconda3/envs/peka/bin/peka.py", line 422, in cut_sites_with_region
df_temp = cut_per_chrom(chrom, df_p, df_m, df_region_p, df_region_m)
File "/home/name/miniconda3/envs/peka/bin/peka.py", line 409, in cut_per_chrom
df_xl_p["cut"] = pd.cut(df_xl_p["start"], interval_index_p)
File "/home/name/miniconda3/envs/peka/lib/python3.7/site-packages/pandas/core/reshape/tile.py", line 226, in cut
raise ValueError('Overlapping IntervalIndex is not accepted.')
ValueError: Overlapping IntervalIndex is not accepted.
Chromosome names and sorting seem to match and the files are iCount outputs (I removed some chromosomes and resorted while testing, but the original files also did not work). I uploaded my files:
https://e.pcloud.link/publink/show?code=XZCJGeZ4hXmsgv6hmLEf71rt8yzRhSgzqWk
What's wrong? :)