Open terasakisatoshi opened 2 years ago
Below is my output of versioninfo()
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.7.2 (2022-02-06)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin19.5.0)
CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
(EDIT): I've tested DataLoaders
with 0.1.3
_
_ _ _(_)_ | Documentation: https://docs.julialang.org
(_) | (_) (_) |
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.7.2 (2022-02-06)
_/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
(@v1.7) pkg> st DataLoaders
Status `~/.julia/environments/v1.7/Project.toml`
[2e981812] DataLoaders v0.1.3
DataLoader
with multiple threads uses eachobsparallel
, which does not guarantee a deterministic ordering.
DataLoaders.jl functionality is currently being added to MLUtils.jl (see https://github.com/JuliaML/MLUtils.jl/pull/33) and I am thinking to add an optional wrapper that reorders the batches, at the cost of some performance likely.
I won't add this here, though, since MLUtils.jl will supersede DataLoaders.jl. I'll leave this open and update once the functionality exists there 👍
Thank you for your quick reply!
DataLoader with multiple threads uses eachobsparallel, which does not guarantee a deterministic ordering.
O.K. As for me, reproducibility of experiments is important when it comes to evaluate some performances in term of precision or accuracy etc...
I will also check out MLUtils.jl.
I am thinking to add an optional wrapper that reorders the batches, at the cost of some performance likely.
Great! Let me know when you are done.
I made an issue that you can subscribe to :) https://github.com/JuliaML/MLUtils.jl/issues/68
When I used this DataLoaders.jl in my project especially deep learning, I encountered a reproductivity problem with multi-threading is enabled. Below is a MWE that describes our issue. Here,
MyDataset
returnsidx
from which comes the 2nd argument ofgetobs
method.From my understanding, for each t in 1:ntrial,
@show batch
should display array from 1 to 100 namely:On the other hand, the actual behavior of the
example.jl
script above will output something like:This phenomena happens when we specify the number of threads more than 1.