slowkow / CENTIPEDE.tutorial

:bug: How to use CENTIPEDE to determine if a transcription factor is bound.
https://slowkow.github.io/CENTIPEDE.tutorial
25 stars 13 forks source link

Adapting FIMO file to the correct format for centipede_data #17

Open Rosmaninho opened 3 years ago

Rosmaninho commented 3 years ago

FIMO from the MEME suite website outputs data in the following format:

motif_id motif_alt_id sequence_name start stop strand score p-value q-value matched_sequence ZNF528 MA1597.1 Peak_31367#chr12#10213230#10213429 54 70 - 27.9633 1.45e-10 2.88e-05 CCCAGGGAAGCCATCTC ZNF528 MA1597.1 Peak_31367#chr12#10213177#10213376 107 123 - 27.9633 1.45e-10 2.88e-05 CCCAGGGAAGCCATCTC SP4 MA0685.1 Peak_73465#chr19#45001886#45002085 50 66 - 25.5488 3.14e-10 3.97e-05 CAGGCCACGCCCCCTTC SP4 MA0685.1 Peak_73465#chr19#45001835#45002034 101 117 - 25.5488 3.14e-10 3.97e-05 CAGGCCACGCCCCCTTC SP4 MA0685.1 Peak_73465#chr19#45001828#45002027 108 124 - 25.5488 3.14e-10 3.97e-05 CAGGCCACGCCCCCTTC THAP11 MA1573.1 Peak_110384#chr3#141370283#141370482 140 158 - 27.4944 3.59e-10 6.36e-05 AGGACTACATTTCCCAGAC CTCF MA0139.1 Peak_71057#chr19#2474615#2474814 96 114 + 25.2247 4.23e-10 0.000166 CGGCCACCAGATGGCGCCA ZNF16 MA1654.1 Peak_181996#chr9#129485761#129485960 1 23 + 27.5244 5.42e-10 0.000109 AATGGGGAGCCATCGAAGGCCTT ZNF16 MA1654.1 Peak_181996#chr9#129485656#129485855 106 128 + 27.5244 5.42e-10 0.000109 AATGGGGAGCCATCGAAGGCCTT

In your tutorial it seems that I need to adapt FIMO output:

sequence.name start stop X.pattern.name strand score p.value

307 chr1 753016 753228 1 + 13.53 1.14e-05

315 chr1 876197 876409 1 - 12.07 3.73e-05

29 chr1 1365483 1365695 1 - 11.88 4.24e-05

30 chr1 1365877 1366089 1 - 12.72 2.24e-05

31 chr1 1406705 1406917 1 - 11.20 6.73e-05

64 chr1 1566358 1566570 1 + 13.99 7.75e-06

q.value matched.sequence

307 NA TTTCCCAGAAGGA

315 NA CTTCCCCGAAGGG

29 NA TTTCCAAGAAAGT

30 NA CTTCCCAGGAGAG

31 NA CTTCACAGAATTA

64 NA TTTCCAAGAACCG

I am getting the following error: -- Column specification ------------------------------------------------------------------ cols( sequence_name = col_character(), chr = col_character(), start = col_double(), stop = col_double(), strand = col_character(), score = col_double(), p-value= col_double(), q-value` = col_double(), matched_sequence = col_character(), motif_id = col_character(), motif_alt_id = col_character() )

Error in h(simpleError(msg, call)) : error in evaluating the argument 'which' in selecting a method for function 'ScanBamParam': In range 4685: at least two out of 'start', 'end', and 'width', must be supplied.`

How do I need to adapt my FIMO output?