Closed zkcao closed 4 months ago
The 5th column of the narrow peak file produced by macs3 command line tool may contain invalid values, you need to fix that manually before using that for peak merging. Please read the macs3 documentation for more details:
NAME_peaks.narrowPeak is BED6+4 format file which contains the peak locations together with peak summit, p-value, and q-value. You can load it to the UCSC genome browser. Definition of some specific columns are:
5th: integer score for display. It's calculated as int(-10log10pvalue) or int(-10log10qvalue) depending on whether -p (pvalue) or -q (qvalue) is used as score cutoff. Please note that currently this value might be out of the [0-1000] range defined in UCSC ENCODE narrowPeak format. You can let the value saturated at 1000 (i.e. p/q-value = 10^-100) by using the following 1-liner awk: awk -v OFS="\t" '{$5=$5>1000?1000:$5} {print}' NAME_peaks.narrowPeak 7th: fold-change at peak summit 8th: -log10pvalue at peak summit 9th: -log10qvalue at peak summit 10th: relative summit position to peak start
Thanks for the suggestions! After I fixed the 5th column with the awk -v OFS="\t" '{$5=$5>1000?1000:$5} {print}' NAME_peaks.narrowPeak
code, a new error occurred:
peaks = snap.tl.merge_peaks(adata.uns['macs3'], snap.genome.hg38)
---------------------------------------------------------------------------
PanicException Traceback (most recent call last)
Cell In[79], line 1
----> 1 peaks = snap.tl.merge_peaks(adata.uns['macs3'], snap.genome.hg38)
File ~/.conda/envs/scpy/lib/python3.9/site-packages/snapatac2/tools/_call_peaks.py:199, in merge_peaks(peaks, chrom_sizes, half_width)
174 """Merge peaks from different groups.
175
176 Merge peaks from different groups. This function is typically used to merge
(...)
196 macs3
197 """
198 chrom_sizes = chrom_sizes.chrom_sizes if isinstance(chrom_sizes, Genome) else chrom_sizes
--> 199 return _snapatac2.py_merge_peaks(peaks, chrom_sizes, half_width)
PanicException: called `Option::unwrap()` on a `None` value
However, I checked for None value with the following code but could not find any None value:
for celltype, df in peaks_dict.items():
print(f"Checking for None values in {celltype}:")
none_found = False
for col in df.columns:
if df[col].is_null().sum() > 0:
print(f" - None values found in column '{col}'.")
none_found = True
if not none_found:
print(" No None values found.")
print("")
Again, thanks a lot for the help!
Can you share with me the peak files and your code?
sent! thanks a lot for your help!
sent! thanks a lot for your help!
Where did you send? I didn't see it.
zhangkai33@westlake.edu.cn this one
The email address is correct. But I didn't receive it. Please try this one: zk65900931@gmail.com
sent again!
Got it. Thank you!
The issue is that some chromosomes in your peak file are missing from the built-in genome. Note the default genome contains only regular chromosomes. Here is the code that works for you data:
import polars as pl
import snapatac2 as snap
import sys
def read_peaks(file):
schema = {
'chrom': pl.Utf8,
'start': pl.UInt64,
'end': pl.UInt64,
'name': pl.Utf8,
'score': pl.UInt16,
'strand': pl.Utf8,
'signal_value': pl.Float64,
'p_value': pl.Float64,
'q_value': pl.Float64,
'peak': pl.UInt64,
}
df = pl.read_csv(file, has_header=False, separator='\t', schema=schema)
return df
if __name__ == '__main__':
files = sys.argv[1:]
peaks = {}
for file in files:
df = read_peaks(file)
df = df.filter(pl.col('chrom').is_in(snap.genome.hg38.chrom_sizes.keys()))
peaks[file] = df
print(snap.tl.merge_peaks(peaks, snap.genome.hg38))
it worked perfectly now! Really appreciate your help!
Dear Kai,
I tried to import some peak data with the following script, however, as I tried to call
snap.tl.merge_peaks
, it threw some data type conversion error. Can you offer some help on this? Thanks a lot for your help!