Open snicolet opened 5 years ago
The assignment bk[:] = v.squeeze()
is not dimension-consistent so the try catch
blocks fall into debug mode.
See https://github.com/pytorch/ELF/blob/master/src_py/elf/utils_elf.py#L211
Could you print out the size of bk
and size of v
here?
Hi Yuandong,
bk has size 2 and is equal to: tensor([1, 1]) v.squeeze has size 128 and is equal to: tensor([172, 75, 90, 177, 6, 147, 189, 71, 181, 165, 85, 69, 141, 27, 59, 25, 87, 104, 153, 161, 108, 129, 136, 174, 173, 54, 85, 177, 82, 138, 170, 3, 91, 187, 68, 30, 166, 15, 45, 47, 41, 48, 160, 89, 122, 106, 178, 190, 63, 103, 29, 174, 164, 48, 39, 12, 168, 35, 44, 115, 64, 12, 108, 138, 13, 98, 173, 6, 188, 57, 98, 180, 94, 163, 25, 49, 2, 135, 73, 88, 143, 111, 61, 172, 42, 164, 160, 138, 91, 0, 127, 94, 78, 64, 179, 2, 86, 92, 137, 47, 170, 161, 82, 188, 44, 56, 6, 16, 113, 185, 82, 51, 57, 189, 41, 40, 126, 10, 30, 175, 42, 15, 9, 173, 149, 147, 110, 180], device='cuda:0') In Breakthrough action values are between 0 and 192.
Thanks.
Different runs give different result but the size of v.squeeze is 64 times the size of bk, and bk is always filled with ones.
When you call e.addField<int64_t>("a")
somewhere in the code, make sure .addExtents
has the correct size. E.g., in your case it should be
e.addField<int64_t>("a").addExtents(batchsize, {batchsize})
where batchsize = 128
. If you called it with batchsize = 2
but sent a vector of dim=128
, you will see this error.
Hi,
When working on a sub-branch of Olivier Teytaud's branch called "newtasks" (which uses the ELF framework for any abstract game), we stumbled on a possible GPU configuration error on run time after a successful compile.
Steps to reproduce:
Note that we forced the GPU number to be one by changing the line 53 of
src_py/rlpytorch/model_loader.py
to be "1" rather than the default "-1" : this was necessary to avoid a GPU run-time error in df_model3.py.But now we get the following error about copying two tensors of different sizes in line 191 of utils_elf.py:
Would you have any idea what our error may be? Thanks in advance!