Make use of PyTorch dataloaders

🚀 Feature

The use of PyTorch dataloaders can create smooth integration of DL/ML models with our datasets that are in either avi or dicom format. That said, this issue is raised to follow up implementations of such PyTorch dataloaders with avi files and json files.

Motivation

In our today's meeting, 5thNov2021T1000, @gomezalberto suggested using PyTorch dataloaders because of various advantages such as loading, preprocessing, or augmenting data from a non trivial dataset. According to Alberto, we would need to find a good balance of parameters (such as batchsize or other) to be sure the computational cost is not very expensive.

Pitch

The implementation of PyTorch dataloaders will make use of *echo*.avi files from the filezilla server (e.g. 01NVb). I suggest to start with the echo files from participants 72 and its labels in json files.

Please use the following path structure or please suggest a better one structure to make this more efficient. Remember that the structure of data is based on how the data was collected.

mx19@sie133-lap:~/datasets/vital-us/echocardiography/videos-echo/01NVb-003-072$ tree -h
.
├── [4.0K]  T1
│   └── [1.3G]  01NVb-003-072-1-echo.mp4
├── [4.0K]  T2
│   ├── [1.2G]  01NVb-003-072-2-echo-cont.mp4
│   └── [4.0K]  extras
│       └── [238M]  01NVb-003-072-2-echo.mp4
└── [4.0K]  T3
    └── [1.1G]  01NVb-003-072-3-echo.mp4

4 directories, 4 files

json files are here, (that branch will be hopefully merged next week to main), and look like this:

mx19@sie133-lap:~/repositories/echocardiography/datasets/labelling-annotation/json_files/4CV/01NVb_003_072$ tree -s
.
├── [        973]  01NVb_003_072_T1_4CV.json
├── [       1152]  01NVb_003_072_T2_4CV.json
├── [       1060]  01NVb_003_072_T3_4CV.json
├── [     149311]  annotations.png
└── [       1062]  README.md

0 directories, 5 files

NOTE. Have a look to video_to_imageframes.py#L285 which might help to make use of the extraction of json labels

Alternatives

Not at the moment.

Additional context

These are few tutorials but feel free to add more

I have just committed 6b678fc which contains cosmetic changes to the scripts that leads to few list of things to do:

[x] Arguments of class EchoViewVideoDataset can be added into one config.yml file as this needs to work for multiple participants.
[ ] Double check the transform variable as this can be misunderstand with from torchvision import transforms
[x] Clarify what exactly we need with dataloader_demo.py as the comments mention other things to do (framerate, crops which will be followed in vital-ultrasound/ai-assisted-echocardiography-for-low-resource-countries#18) that might lead to further work in new issues/PR/etc. Also, not sure if demo is the right word for this name. Perhaps we need to rename soon as echo_dataloader.py
[x] Implement further test to be sure the dataloader can deal with different clips and json files that does not have 4CV labels (see vital-ultrasound/ai-assisted-echocardiography-for-low-resource-countries#21 where it already addressed)

Notes from the today's meeting with AG.

config_files

The use of config_files/config_v2i.yml as used in python video_to_imageframes.py --config ../config_files/config_v2i.yml can help to add other information for the machine device, dataset_paths, learning_settings (batch-size, learning rate, etc) and operator information such as (ID, skill level, age, etc) and patient information (ID, diseases, ect).
Current method convert_sec_to_min_sec_ms can be simplified as AG implemented with a variable SMMS

Detecting frames (and clips) of 4CV and non-4CV

make use of a dataloader to generate clips of non-4CV. This can be using the first frames or random selection of frames (maybe same duration of of non-4CV)
How big/small the window frame will be? This will depend on one cardiac cycle and frame rate. We can do a sliding window with some gaps

Previous commits successfully read frames but for some reason I am getting this one:

using frame_channels_height_width = np.moveaxis(image_frame_array_3ch_i, -1, 0) frame_torch = torch.from_numpy(frame_channels_height_width)

clip 2 ; image_frame_index 14175 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 20096/20129 [00:32<00:00, 608.41it/s]Traceback (most recent call last): File "dataloader_4CV.py", line 15, in data = dataset[sample_index] File "/home/mx19/repositories/echocardiography/source/helpers/various.py", line 13, in wrap_func result = func(*args, **kwargs) File "/home/mx19/repositories/echocardiography/source/dataloaders/EchocardiographicVideoDataset.py", line 111, in getitem frame_channels_height_width = np.moveaxis(image_frame_array_3ch_i, -1, 0) File "<__array_function__ internals>", line 5, in moveaxis File "/home/mx19/anaconda3/envs/rt-ai-echo-VE/lib/python3.8/site-packages/numpy/core/numeric.py", line 1461, in moveaxis source = normalize_axis_tuple(source, a.ndim, 'source') File "/home/mx19/anaconda3/envs/rt-ai-echo-VE/lib/python3.8/site-packages/numpy/core/numeric.py", line 1391, in normalize_axis_tuple axis = tuple([normalize_axis_index(ax, ndim, argname) for ax in axis]) File "/home/mx19/anaconda3/envs/rt-ai-echo-VE/lib/python3.8/site-packages/numpy/core/numeric.py", line 1391, in axis = tuple([normalize_axis_index(ax, ndim, argname) for ax in axis]) numpy.AxisError: source: axis -1 is out of bounds for array of dimension 0 20130it [00:32, 621.10it/s]
(rt-ai-echo-VE) mx19@sie133-lap:~/repositories/echocardiography/scripts/examples$

Notes

using frame_torch = torch.from_numpy(image_frame_array_3ch_i)

TypeError: expected np.ndarray (got NoneType)
using frame_torch = torch.from_numpy(image_frame_array_3ch_i).numpy()

TypeError: expected np.ndarray (got NoneType)
using frame_torch = torch.as_tensor(image_frame_array_3ch_i)

RuntimeError: Could not infer dtype of NoneType

torch.as_tensor always tries to avoid copies of the data. One of the cases where as_tensor avoids copying the data is if the original data is a numpy array. https://stackoverflow.com/questions/48482787/pytorch-memory-model-torch-from-numpy-vs-torch-tensor

Using image_frame_array_1ch_i = cv.cvtColor(image_frame_array_3ch_i, cv.COLOR_BGR2GRAY ) #cv.COLOR_BGR2RGB cv.COLOR_BGR2GRAY frame_torch = torch.from_numpy(image_frame_array_1ch_i)

Traceback (most recent call last): File "dataloader_4CV.py", line 15, in data = dataset[video_index] File "/home/mx19/repositories/echocardiography/source/helpers/various.py", line 13, in wrap_func result = func(*args, **kwargs) File "/home/mx19/repositories/echocardiography/source/dataloaders/EchocardiographicVideoDataset.py", line 117, in getitem image_frame_array_1ch_i = cv.cvtColor(image_frame_array_3ch_i, cv.COLOR_BGR2GRAY ) #cv.COLOR_BGR2RGB cv.COLOR_BGR2GRAY cv2.error: OpenCV(4.5.4-dev) /tmp/pip-req-build-h45n7_hz/opencv/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'

googled results:

That means the image could not be loaded correctly. Verify that the path is correct. In your case there should be my_image.jpg in the same directory as eval.py. https://github.com/dbolya/yolact/issues/52

To try

Perhaps this might help: https://pytorch.org/docs/stable/generated/torch.movedim.html#torch.movedim
to try torch.from_numpy(rpn_bbox_targets).float().cuda()
Alberto recommends to try "an alternative to the pipeline opencv->numpy->torch is to use the recent pytorchvideo (https://pytorchvideo.org/)."

Potential source of the problem with loading frames might be related to the absent timestamp in the last frames of the videos

        print(image_frame_index)
        print(frame_msec, current_frame_timestamp)
        print(type(frame_torch_chs_h_w), frame_torch_chs_h_w.shape)


23267
776342.2333333334 (12, 56, '342.233', '12:56:342.233')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23268
776375.6000000001 (12, 56, '375.600', '12:56:375.600')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23269
776408.9666666668 (12, 56, '408.967', '12:56:408.967')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23270
776442.3333333334 (12, 56, '442.333', '12:56:442.333')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23271
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23272
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23273
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23274
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23275
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23276
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23277
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23278
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23279
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23280
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23281
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23282
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23283
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23284
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23285/23285 [00:37<00:00, 617.44it/s]
Function '__getitem__' executed in 37.7214s

20112
671070.4 (11, 11, '70.400', '11:11:70.400')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20113
671103.7666666667 (11, 11, '103.767', '11:11:103.767')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20114
671137.1333333334 (11, 11, '137.133', '11:11:137.133')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20115
671170.5000000001 (11, 11, '170.500', '11:11:170.500')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20116
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20117
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20118
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20119
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20120
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20121
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20122
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20123
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20124
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20125
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20126
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20127
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20128
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
20129
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20130/20130 [00:32<00:00, 610.75it/s]
Function '__getitem__' executed in 32.9680s
(rt-ai-echo-VE) mx19@sie133-lap:~/repositories/echocardiography/scripts/examples$


<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23209
774406.9666666667 (12, 54, '406.967', '12:54:406.967')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23210
774440.3333333334 (12, 54, '440.333', '12:54:440.333')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23211
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23212
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23213
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23214
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23215
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23216
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23217
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23218
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23219
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23220
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23221
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23222
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23223
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
23224
0.0 (0, 0, '0.000', '00:00:0.000')
<class 'torch.Tensor'> torch.Size([3, 1080, 1920])
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 23225/23225 [00:36<00:00, 630.84it/s]
Function '__getitem__' executed in 36.8243s
(rt-ai-echo-VE) mx19@sie133-lap:~/repositories/echocardiography/scripts/examples$

vital-ultrasound / ai-echocardiography-for-low-resource-countries