I have observed that the method oracle.get_links unfortunately returns different results across runs (celloracle == 0.18.0). While these differences are not huge (mean jaccard index of 0.9 between different runs), it is important to have a fixed seed to make results reproducible.
Even though you correctly use BaggingRegressor with a fixed seed, the problem comes upstream since you use sets to store TF gene symbols in oracle.TFdict. The problem with using sets is that their order is dependent on the current memory hash being used, meaning that at each run their order is going to be slightly different. This makes BaggingRegressor sample differently event though it uses the same seed all the time. However the solution is very easy, to fix the order of the selected TFs by sorting them alphabetically:
# Sort to fix seed
reg_all = sorted(reg_all)
With this simple change results are always the same.
Note that to get different results with the previous version you need to restart the kernel/run the script again so that the memory hash is restarted. Running the same code inside the same session in a jupyter lab will yield the same results but not if you restart the notebook.
Hope this is helpful!
Hi @KenjiKamimoto-wustl122
I have observed that the method
oracle.get_links
unfortunately returns different results across runs (celloracle == 0.18.0
). While these differences are not huge (mean jaccard index of 0.9 between different runs), it is important to have a fixed seed to make results reproducible.Even though you correctly use
BaggingRegressor
with a fixed seed, the problem comes upstream since you use sets to store TF gene symbols inoracle.TFdict
. The problem with using sets is that their order is dependent on the current memory hash being used, meaning that at each run their order is going to be slightly different. This makesBaggingRegressor
sample differently event though it uses the same seed all the time. However the solution is very easy, to fix the order of the selected TFs by sorting them alphabetically:With this simple change results are always the same.
Note that to get different results with the previous version you need to restart the kernel/run the script again so that the memory hash is restarted. Running the same code inside the same session in a jupyter lab will yield the same results but not if you restart the notebook. Hope this is helpful!