libffcv / ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)
https://ffcv.io
Apache License 2.0
2.86k stars 180 forks source link

Support build in Windows #93

Closed kynk94 closed 2 years ago

kynk94 commented 2 years ago

Resolves #63

Add package finding process in setup.py. And modifying ffcv/libffcv.py, libffcv/libffcv.cpp to import ffcv without error.

My 2 windows pc works without error, but I think need others test. And I didn't test in linux.

GuillaumeLeclerc commented 2 years ago

Hi! did the unit tests passed ?

We had issues with the MacOS port as the pread seem to have different signature there

kynk94 commented 2 years ago

I just did unit test, and it failed. Above test means build and import ffcv components.

pytest give error when iterating Loader in tests\test_array_field.py The error was occurred when call numba jit compiled code to get results. https://github.com/libffcv/ffcv/blob/main/ffcv/loader/epoch_iterator.py#L134 The code is CPUDispatcher(<function stage_0 at 0x0000023F941B43A0>) instance

The error is below, but there is no detailed traceback. So there is a problem with debugging and need more time.

Windows fatal exception: access violation

Current thread 0x00003cdc (most recent call first):
  File "c:\users\user\miniconda3\lib\site-packages\ffcv\loader\epoch_iterator.py", line 134 in run_pipeline
  File "c:\users\user\miniconda3\lib\site-packages\ffcv\loader\epoch_iterator.py", line 80 in run
  File "c:\users\user\miniconda3\lib\threading.py", line 932 in _bootstrap_inner
  File "c:\users\user\miniconda3\lib\threading.py", line 890 in _bootstrap
GuillaumeLeclerc commented 2 years ago

My guess is that its might be related to pread but I don't know how else I can help you :/

kynk94 commented 2 years ago

There seems to be a problem with pread too, but there is another problem. I configured the access violation error. shape of the source and destination were different when mem_cpy was running.

Before mem_cpy, there is a part to find the address in mem_state in the mem_read operation. However, mem_state had empty ptr and size.

I fixed this bug and some unit test passed, but other ffcv/fields all have similar bugs. After I fix these, I'll comment this PR.

kynk94 commented 2 years ago

All bugs resolved, and passed all unit test. I skipped test_memory_leak.py because DatasetWriter does not have write_pytorch_dataset method . And in test_rrc.py, when mode is 'jpg', assert is not always True, so I compared it with the results on linux and it seemed to match.

Since all tests passed, cdll.msvcrt._read behaves exactly the same as pread.

bdeb248 This commit was added for the following reasons:

# On Windows
allocation_table = np.concatenate([
    np.array(x).view(ALLOC_TABLE_TYPE) for x in allocations if len(x)
])
# throws ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

allocation_table = np.concatenate([
    np.array(x, dtype=ALLOC_TABLE_TYPE) for x in allocations if len(x)
])
# throws ValueError: zero-dimensional arrays cannot be concatenated

allocation_table = np.array([
    np.array(x, dtype=ALLOC_TABLE_TYPE) for x in allocations if len(x)
])
# does not throw error.

And there is one more minor issue. If the size of the written dataset by DatasetWriter is less than 16MB, it will take a very long time. I don't know why.

GuillaumeLeclerc commented 2 years ago

@kynk94 Do you think this could be merged into v1.0 ? Would you be willing to test it when I make a release candidate ?

(Thanks for the work !)

kynk94 commented 2 years ago

yes it could be merged if there is no problem on linux with this PR's logic. And I can test it with my windows desktop because the test process is simple.

GuillaumeLeclerc commented 2 years ago

Looks very good to me. I'm merging into v1.0.0. Thank you so much @kynk94 I hope it's going to be useful to many people :smiley: