Closed CaiCheng1996 closed 11 months ago
This error should have been fixed in the last commit from Dec 13, 2023.
Please, run git pull
in your GeneExt directory and try again.
Explanation:
I am assuming you are running the tool in the --orphan
mode? Then, this error usually arises due to an error in bedtools and is related to the orders of chromosomes in the genome file being different from the chromosome order in peaks file (e.g. some chromosomes have no called peaks => different chr order which leads to bedtools error.
Please, let me know if the error persists.
Oh, thank you! There were somgthing wrong in my chr name.
Great! Let me know if the tool finishes without errors. Also, if you have any other issues / questions, I'm happy to help.
Grisha
Thank you for your help! I do have one another question about the --peak_perc. As the manual introduce, GeneExt will calculate per-base coverage distribution for genic peaks and then used to filter intergenic peaks, and there is an example density-plot, and I do see the file genic_peaks.bed in the tmp/. However, my question is where is the intergenic peaks information? So far I can't do the same density-plot and set the cutoff of --peak_perc.
Regards.
Thanks for your great question! So far, the tool doesn't output such plots as we wanted to minimize the number of dependencies (e.g. even a minimal base-R plot will require an R to be installed in the environment). However, you can generate such plot using the peak coverage files in the tmp/ directory:
The following code generates a histogram similar to those the Fig.2 of the preprint:
peak_perc = 25
genic = read.table('tmp/genic_peaks.bed')
noov = read.table('tmp/allpeaks_noov.bed')
noov = noov[!noov$V4 %in% intersect(genic$V4,noov$V4),]
genic_cov = genic[,7]
noov_cov = noov[,7]
png('peak_coverage.png')
plot(density(log10(genic_cov)),col = 'red',main = 'Peak coverage',sub = paste0('peak perc: ',peak_perc),xlab = 'log10 Normalized peak coverage')
lines(density(log10(noov_cov)),col = 'blue')
thr = quantile(genic_cov,peak_perc/100)
abline(v = log10(thr), lty = 2)
dev.off()
Please, let me know if this works for you.
I have got the error: Traceback (most recent call last): File "geneext.py", line 721, in
nrow = helper.get_tsv_nrow(genicpeaksfile)
File "geneext/helper.py", line 1172, in get_tsv_nrow
df = pd.read_csv(bedfile,sep = '\t')
File "geneext/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 948, in read_csv
return _read(filepath_or_buffer, kwds)
File "geneext/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 611, in _read
parser = TextFileReader(filepath_or_buffer, kwds)
File "geneext/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1448, in init
self._engine = self._make_engine(f, self.engine)
File "geneext/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1723, in _make_engine
return mapping[engine](f, self.options)
File "geneext/lib/python3.10/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 93, in init
self._reader = parsers.TextReader(src, **kwds)
File "parsers.pyx", line 586, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file
And the log file shows: Running subsampling ... done Running macs2 ... done Filtering macs2 peaks ...
And I can see the files "allpeaks.bed", "allpeaks_coverage.bed" in the tmp/, but the genic_peaks.bed is empty. How could that happen? Is there someting wrong with gtf or bam?