jsxlei / SCALE

Single-cell ATAC-seq analysis via Latent feature Extraction
MIT License
97 stars 17 forks source link

peak annotation in the output #16

Open anu-bioinfo opened 4 years ago

anu-bioinfo commented 4 years ago

Hello, I have been using your program to do imputation for some scATAC-seq data. I have a query regarding the output. The output for the both --binary and --impute function alter the peak annotation (right of the pic: peak.txt output from --binary command and left of the picture: is the original peaks.bed). Instead of the character string of chr_start_end there are these long integers (right and left of the picture attached). How can I remap these integers back to the original peak string ? Are these integers just the row numbers from the peak file ? I would be grateful for any assistance. Thanks in advance

Screenshot from 2020-08-06 11-56-13

Anupam

jsxlei commented 4 years ago

It is should not be the row number of the peak file. SCALE read the peak file (mtx format) with pd.read_csv(filename, sep='\t', header=None).iloc[:, -1].values, it may be your input peak file is three columns separated by \t. Your case seems that it reads the third column (last column, end of the chr start end) of the peak file as the peak. The valid input peak should better be one columns just like your right one. If the problem can not be fixed, you can git clone the lastest version and reinstall, which should be ok for the peak index.