greenelab / GCB535

Materials for GCB535 at Penn.
BSD 3-Clause "New" or "Revised" License
20 stars 8 forks source link

Motif-III Homer analysis: different users can get very different answers #218

Open bvoight opened 5 years ago

bvoight commented 5 years ago

I think this is a known issue, but a student expressed some irritation about the 'lack of reproducibility' if two students queue up the same command line but get radically different answers to the motif look-up

I don't remember the exact question, but I think it might be due to the random selection of 200 bp windows across the genome, which then are used as background for the survey. I wonder if there's a way that we make this uniform - perhaps set the random seed so that each time the same sequences are selected, or preselect the sequences.

Or, just explain the differences (e.g., make them re-run it).

It does seem a bit strange that running this the same way does get radically different answers -- if the randomization process is truly random, shouldn't the variation between runs be slight?