Closed corinsexton closed 3 years ago
The most important parameter for reproducibility in a technical replicate with Segway is SEGWAY_RAND_SEED
. It needs to be set to a value of your choosing and if kept the same across replicates and if all other parameters are indeed exactly the same, produce identical results every time. We consistently run regression tests on the same seed to ensure consistency.
Setting the seed effectively sets the same initial random parameters for the model, and in the minibatch case, pick the same "randomly" selected regions.
As you've noted since the parameters are randomly selected, even if the same gaussian parameters are effectively learned across replicates, there's no guarantee on label/ordering. You might be able to roughly match up equivalent labels if you really needed to but setting the SEGWAY_RAND_SEED
variable is definitely the way to go.
If you're still experiencing the issue after setting the seed however, please let me know!
Hello,
I'm currently running segway with 4 histone marks (H3K4me1, H3K4me3, H3K27ac, H3K27me3) and I've completed 3 runs with 2 replicates each all with slightly different parameters as explained below:
run 1 and run 2: minibatch fraction (0.01), gaussian (1), 5 instances run 3 and run 4: minibatch fraction (0.01), gaussian (3), 5 instances run 5 and run 6: minibatch fraction (0.05), gaussian (3), 5 instances
The problem I'm encountering is that even between the technical replicates (exact same parameters) I'm having trouble getting any sort of reproducibility of results.
I know this heatmap is a lot to look at (all runs showing the signal track distribution for each of the labels), but I expected better clustering of the labels between the runs especially the technical replicates. So my question is: Is this sort of reproducibility just to be expected or is there a way to better train the model for consistency of labelling?