rr1859 / R.4Cker

MIT License
16 stars 15 forks source link

Inconsistencies with Interaction Counts #21

Open aliu90 opened 7 years ago

aliu90 commented 7 years ago

Hi,

I've noticed that there are differences in the output interaction files (high, low, non) using the same input bed files and the same analysis commands. The normalized counts remain the same--as they should, but the genomic coordinates that are called 'interactions' change each time, and this seems a bit concerning to me.

For example, I'll use the command nb_results_1=nearBaitAnalysis(my_obj_1,k=10) and get 6 high interactions the first attempt, then immediately afterwards use the same command again and only get 5 high interactions the second time. Even the genomic coordinates of the interactions will change slightly.

Is there any reason to this? Or is there something I'm missing or doing wrong?

Thanks

rr1859 commented 7 years ago

Hi, thanks for bringing this to my attention. Since we have a synthetic sample generator this could alter the parameter estimation a bit each time. Were your results very different? I added a set.seed command before generating the synthetic samples so you should get consistent results now. Let me know if this works.

daler commented 7 years ago

@rr1859 I've been running into this issue as well where running the same analysis command with the same input gave different results. Setting the seed does make the results identical between subsequent runs.

But choosing a different seed gives different results: regions that were called as interacting or as differentially interacting appear and disappear depending on the seed.

Currently the way to resolve this is to re-run 4c-ker multiple times using multiple seeds and aggregate the results to try to get a consensus, but that seems inelegant. Is there a way to fix this in the algorithm itself?