libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.8k stars 180 forks source link

read labels are duplicated and are mismatched from written labels #231

Closed ericjang closed 2 years ago

ericjang commented 2 years ago

This commit adds a test that demonstrates the issue - let me know if I should open a PR to check this in. https://github.com/libffcv/ffcv/commit/655c7f8f7963343a2d6482c371ef78a4c5df58d6

Basically there are two tests that write images paired with a dummy label (either as an NDArrayField or an IntField). Then I read them back. The set of labels read no longer match the expected labels written.

for example, on the integers:

> np.testing.assert_array_equal(expected, labels)
(Pdb) p expected
array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12.,
       13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25.,
       26., 27., 28., 29., 30., 31., 32., 33., 34., 35., 36., 37., 38.,
       39., 40., 41., 42., 43., 44., 45., 46., 47., 48., 49., 50., 51.,
       52., 53., 54., 55., 56., 57., 58., 59., 60., 61., 62., 63., 64.,
       65., 66., 67., 68., 69., 70., 71., 72., 73., 74., 75., 76., 77.,
       78., 79., 80., 81., 82., 83., 84., 85., 86., 87., 88., 89., 90.,
       91., 92., 93., 94., 95., 96., 97., 98., 99.], dtype=float32)
(Pdb) p labels
array([75., 76., 77., 78., 79., 80., 81., 82., 83., 84., 85., 86., 87.,
       88., 89., 90., 91., 92., 93., 94., 95., 96., 97., 98., 99., 75.,
       76., 77., 78., 79., 80., 81., 82., 83., 84., 85., 86., 87., 88.,
       89., 90., 91., 92., 93., 94., 95., 96., 97., 98., 99., 75., 76.,
       77., 78., 79., 80., 81., 82., 83., 84., 85., 86., 87., 88., 89.,
       90., 91., 92., 93., 94., 95., 96., 97., 98., 99., 75., 76., 77.,
       78., 79., 80., 81., 82., 83., 84., 85., 86., 87., 88., 89., 90.,
       91., 92., 93., 94., 95., 96., 97., 98., 99.], dtype=float32)
GuillaumeLeclerc commented 2 years ago

Hello,

It is not a bug. FFCV produces arrays and give them to you as you iterate through the iterator. However they are owned by FFCV, therefore, once an iteration is finished FFCV might (to avoid allocating too much memory) reuse a tensor and write something else there. If you need data after an iteration is over it's your responsibility to allocate memory and copy it there within the loop. As a general rule. If you didn't create an array you don't own it and therefore shouldn't assume the owner (here FFCV) will not use it for other purposes.

TLDR: Adding a copy() lines 58 and 79 will fix the "issue".

Hope this helps!