sdobber / FluxArchitectures.jl

Complex neural network examples for Flux.jl
MIT License
122 stars 15 forks source link

`get_data` creates NaNs on the GPU #16

Closed sdobber closed 2 years ago

sdobber commented 3 years ago

input, target = get_data(:exchange_rate, poollength, datalength, horizon) |> gpu works fine on the CPU, but creates some NaNs at the beginning of the dataset (at least on a Jetson Nano).

sdobber commented 3 years ago

The actual raw data loads differently on a x64 and arm64, with the latter being wrong.

KingBoomie commented 2 years ago

I had a similar problem. It has nothing to do with GPUs, but does fail more loudly on em. prepare_data creates an uninitialized array with similar. and then doesn't fill it all up. Leading to random data being read as float, with some being NaNs.

To make it really obvious I replaced the similar call with zeros(Float32, ... ):

function prepare_data(data, poollength, datalength, horizon; normalise=true)
  extendedlength = datalength + poollength
  extendedlength > size(data, 1) && throw(ArgumentError("datalength $(datalength) larger than available data $(size(data, 1) - poollength)"))
  (normalise == true) && (data = Flux.normalise(data, dims=1))
  features = zeros(Float32, size(data, 2), poollength, 1, datalength)  # CHANGED THIS
  for i = 0:poollength - 1
      for j = poollength:datalength
          #                  \/ this j starts at poollength => 1:(poollength-1) will always be uninit data
          features[:,i + 1,1,j] = data[j - i,:]
      end
  end
  labels = circshift(data[1:datalength,1], -horizon)
  return features, labels
end

and then

prepare_data(ones(Float32, 510, 3), 10, 500, 7, normalise=false)
> 3×10×1×500 Array{Float32, 4}:
[:, :, 1, 1] =
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

[:, :, 1, 2] =
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

[:, :, 1, 3] =
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

...

[:, :, 1, 498] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

[:, :, 1, 499] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

[:, :, 1, 500] =
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

There should be no zeros (aka uninit memory), but there is.

Edit, added comments to make the bug really obvious.

sdobber commented 2 years ago

@KingBoomie Thanks a lot for spotting this!

KingBoomie commented 2 years ago

Thanks for the quick fix! This is now the most useful time series analysis package for me. <3