tariks / peakachu

Genome-wide contact analysis using sklearn
MIT License
57 stars 9 forks source link

How to get the .bed files relative to the matrix and how to get the picture similar to example you list #8

Open Dweson opened 4 years ago

Dweson commented 4 years ago

I don't how to find the .bed files, what's more, we try to visualize the data with scores as 2d annotation, but the image is ugly and the black spots are too much. thank u

XiaoTaoWang commented 4 years ago

Hi, you can do that using command peakachu pool, which performs a local clustering algorithm on the original calls and prints single representatives (black dots in the example) for each cluster.

tariks commented 4 years ago

Adding to Xiaotao's answer, the bed files should be found in the folder specified by the -O option. The folder will be created if it didn't exist already. If there are too many black spots after using pool, then try filtering with a higher threshold (i.e. .95 instead of .9)

Dweson commented 4 years ago

@tariks @XiaoTaoWang Thank you very much, I have solved the second question according to your answers. But what I mean the .bed file is the training input text file, I don't know how to find it. The file I find from GEO database of Tang et al and Mumbach et al is not same as the file in /example dictionary.

tariks commented 4 years ago

That makes sense. Both example files were derived from excel sheets from each publication's supplemental files, and not from GEO.

Dweson commented 4 years ago

@tariks Thank U, I find the .bed files as you say, but when I score another cooler file from 4DN, I got an error. for i in models/*pkl; do peakachu score_chromosome -p 4DNFIP3ELSZY.mcool::resolutions/10000 --balance -O scores -m $i; done ValueError: row, column, and data array must all be the same length So how can I process the cooler so it can be suitable for peakachu, thanks!