slowkow / CENTIPEDE.tutorial

:bug: How to use CENTIPEDE to determine if a transcription factor is bound.
https://slowkow.github.io/CENTIPEDE.tutorial
25 stars 13 forks source link

gunzip -c chr*.fa.masked > hg19.fa #13

Open zhenzuo2 opened 6 years ago

zhenzuo2 commented 6 years ago

Hello,

When I run

gunzip -c chr*.fa.masked > hg19.fa

in CENTIPEDE.tutorial in Genomic sequence Section, I got the following error

gzip: chr10.fa.masked: not in gzip format
gzip: chr11.fa.masked: not in gzip format
gzip: chr11_gl000202_random.fa.masked: not in gzip format
...

Am I doing it wrong? Thanks.

slowkow commented 6 years ago

Yes, gzip will throw that error when you try to decompress (gunzip) a file that is not compressed.

You only need to decompress a file when it is compressed (it will have the .gz file extension).

slowkow commented 6 years ago

I think you caught a typo in my tutorial! Thanks for reporting it.

I haven't tested it, but I think the correct command is:

cat chr*.fa.masked > hg19.fa

Good luck! Let me know how it goes. Thanks again for reporting the problem.

zhenzuo2 commented 6 years ago

Thank you so much for your prompt reply. Your tutorial is the best tutorial I can find online. It helps a lot for beginners like me.

I also have a question about where to download ENCFF001UUQ.narrowPeak.gz and ENCFF001UUQ_gt8.narrowPeak.gz. You explained it in DNase-Seq data section but I am still not very sure where to find them. Thanks.

slowkow commented 6 years ago

It looks like the particular files I used in the tutorial have been archived.

You should find a different study that interests you:

https://www.encodeproject.org/search/?type=Experiment&assay_term_name=DNase-seq&replicates.library.biosample.donor.organism.scientific_name=Homo+sapiens

Here is one possible example, showing the links to the BAM file and the narrowPeak bed file:

2018-09-16_17-54-45

zhenzuo2 commented 6 years ago

Thank you a lot. It makes sense now.

Best,

Zhen