karel-brinda / NanoSim-H

NanoSim-H: a simulator of Oxford Nanopore reads; a fork of NanoSim.
Other
17 stars 6 forks source link

Remove an unneeded TODO #15

Closed ExplodingCabbage closed 5 years ago

ExplodingCabbage commented 5 years ago

This code is fine; dict iteration order is guaranteed to be stable as long as the dict is not modified, and this was the case even back in Python 2. See https://docs.python.org/2/library/stdtypes.html#dict.items:

If items(), keys(), values(), iteritems(), iterkeys(), and itervalues() are called with no intervening modifications to the dictionary, the lists will directly correspond.

(Also, even if dict iteration order was unstable here, and the first key value you got in the for key in seq_len.keys(): loop was different on each pass through the while True: loop, it seems to me that that would still be fine?)

karel-brinda commented 5 years ago

Thanks for the comment and for the PR. My concern was whether the simulations are reproducible across different platforms and Python versions. In my understanding, this is not guaranteed, therefore dictionary keys would need to be sorted.

ExplodingCabbage commented 5 years ago

Aha! I see the concern. Yes, I think you're right to worry that the order will be different across Python versions because dict order was changed in Python 3.6: https://stackoverflow.com/q/39980323/1709587

Perhaps the right change is to replace seq_len.keys() with sorted(seq_len.keys()), then.

However, even the output of random module functions across Python versions is not consistent unless you explicitly specify version=1 when seeding, so just sorting won't be enough on its own to give you reproducible simulations across Python versions.

ExplodingCabbage commented 5 years ago

In any case, I'll close this PR since I misunderstood the intent of the comment and there's still something there to do.

karel-brinda commented 5 years ago

I didn't know that random is different in different Python versions. Thank you for mentioning that – I'll need to fix that in multiple projects.