Problem loading images during Flow Generator training

jesrackeat commented 3 years ago

Hi Jim,

I am hitting a stumbling block with the Flow Generator. I followed the "Getting started" instructions and then set up my experiment using the GUI. When I attempt to train the Flow Generator it hits this error 5-20% of the way through:

cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-6sxsq0tp\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'

All of the videos I'm using for training play through fine with VLC.

Hopefully an easy fix on my end!

All the best,

Jessica

main.log train.log Anaconda prompt traceback.txt

jbohnslav commented 3 years ago

Thanks for the detailed error report! The empty in cvtColor bug means that it failed to load a frame in one of your videos. Are you sure that all of your videos open ok? The fact that it only happens 5-20% through indicates to me that it is one part of one video, and only crashes when the random sampler happens to try to grab those frames.

jesrackeat commented 3 years ago

Hmmm to be fair I've checked the videos only by scrubbing through from beginning to end. I can let them play out in real time instead. Does the Flow Generator run through videos in a specific order? If I can narrow down which video may be the problem that could save a couple hours of video playing.

jbohnslav commented 3 years ago

It loads in a random order, by design. However, we could write a quick script that should do this for you.

from deepethogram.file_io import VideoReader
from deepethogram.projects import get_records_from_datadir
from tqdm import tqdm

records = get_records_from_datadir(r'D:\woolf_DEG_project\DATA')
for key, record in tqdm(records.items()):
    with VideoReader(record['rgb']) as reader:
        for i in tqdm(range(len(reader)), leave=False):
            try:
                frame = reader[i]
            except Exception as e:
                print('error reading frame {} from video {}'.format(i, record['rgb']))
                raise

jbohnslav commented 3 years ago

Did this script work for you?

jesrackeat commented 3 years ago

I've been stuck in the lab but will try it out this afternoon and let you know if I have problems!

jesrackeat commented 3 years ago

I am hitting an error. Do I need to install something beyond deepethogram to run this code?

(deg) C:\Users\abrairalab\gui_logs\201104_130727_None\test_deepethogram>debug_videos.py Traceback (most recent call last): File "C:\Users\abrairalab\gui_logs\201104_130727_None\test_deepethogram\debug_videos.py", line 1, in <module> from deepethogram.file_io import VideoReader ImportError: No module named deepethogram.file_io

jesrackeat commented 3 years ago

Update: just managed to get the code running, and it looks like all the videos are fine. Let me know if you have any further ideas!

jbohnslav commented 3 years ago

Did you change the file path to the one corresponding to your project? Did you see the progress bars get through all of your videos?

jesrackeat commented 3 years ago

Yes I did, and it got through 8 of 8 videos with full progress bars for each one.

jbohnslav commented 3 years ago

Hmm... The error message was clearly in video reading, but all your videos read fine-- I'm honestly stumped. How many num_workers did you use? Maybe try training again with fewer CPU workers, e.g. 2-4. Maybe it was a resource contention issue.

jesrackeat commented 3 years ago

I was using 8. When I dropped it to 2, training made it between 50 and 90% of the way through. Does that indicate to you it might be a resource contention issue?

jbohnslav commented 3 years ago

Hmm, it seems like we might be on to something. How many CPU cores do you have? Secondly, what approximate resolution are your videos in? Third, what filetype / codec are they?

I think it might work if we change your formats, and these questions will help me determine that.

jesrackeat commented 3 years ago

8 cores, and the videos are around 190x180 pixels (cropped from a larger video). They're MP4s, although half of them are MPEG4 (codec H264 - MPEG-4 AVC part 10 avc1), half are MPEG1 (codec MPEG-1 Video mp1v), probably due to cropping and converting the videos in two separate batches. I'm including screenshots of the codec information in case I missed anything.

MPEG4 MPEG1

jbohnslav commented 3 years ago

Hi Jessica, sorry for the delay, I was not working during the holiday. This seems like a very normal size / encoding combination--thanks for your help. I think it might be something to do with trying to read the same video from multiple processes.

Could you use deepethogram.projects.convert_all_videos to convert your videos? It will take up a lot more disk space, but unless you have many hours of videos, it should still be reasonable. https://github.com/jbohnslav/deepethogram/blob/6ac7f608298045fad3e5e29e870d7fa0852a7f63/deepethogram/projects.py#L1078

If you plan to copy your videos a lot (e.g. on and off of a server / cluster) you can use movie_format='hdf5'. If they're staying put, you can use movie_format='directory', which will explode the videos into sets of individual .PNGs. You'll also notice that flow generator training will be a lot faster as well.

Here's an example:

from deepethogram import projects

projects.convert_all_videos('/path/to/my/project_config.yaml', movie_format='directory')

jesrackeat commented 3 years ago

No worries at all, 'tis the season.

I re-installed DEG to get the latest version, used your suggested code to convert all videos to batches of PNGs, and tried training the Flow Generator from the GUI again. No luck; same error.

cv2.error: OpenCV(4.4.0) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-6sxsq0tp\opencv\modules\imgproc\src\color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cv::cvtColor'

I included the Anaconda Prompt traceback in case there are additional clues in there. Any further ideas?

Anaconda prompt traceback.txt

EDIT: I still had the videos in the same folder as the PNGs, removing the videos and leaving it to train as I leave for the day. I'll let you know if it had problems in the morning!

jbohnslav commented 3 years ago

EDIT: I still had the videos in the same folder as the PNGs, removing the videos and leaving it to train as I leave for the day. I'll let you know if it had problems in the morning!

Great, let me know if this fixes the issue!

jesrackeat commented 3 years ago

Hello again!

Looks like the Flow Generator trained successfully. "210105_154323_flow_generator_train_None" is now available in the GUI dropdown under FlowGenerator.

I got an error when I tried to train the Feature Extractor via the GUI, however.

Based on what I think dataloaders.py is doing at line 455, I'm wondering whether the issue is that DEG thinks none of my videos are labelled. I labelled one video while it was an MP4, before implementing your video conversion fix. Now all videos are in folders of PNGs. I also get the error if I move the CSV with the labels into the folder with the PNGs.

What do you think?

jbohnslav commented 3 years ago

I'm wondering whether the issue is that DEG thinks none of my videos are labelled.

That's exactly right. How many videos have you labeled? One potential issue is that there aren't enough videos to make both a training and a validation set.

jesrackeat commented 3 years ago

1 video is labelled (30 min/45k frames), with 7 videos currently unlabelled. Shall I label another before trying to train the Feature Extractor, or should there be enough frames in that one video to split into training/validation?

jbohnslav commented 3 years ago

ah, this is the issue. This seems to be recurring, so maybe I should implement this feature.... but for all the datasets in the paper, I wanted to strictly split into train, validation, and test based on videos, rather than frames-- this ensures that the models will generalize across animals. Therefore, we require at least 2 videos to be labeled.

If you scroll up in the error message, do you see warning: only 1 records found...?

jesrackeat commented 3 years ago

I'll be back in the lab tomorrow and have a look.

jesrackeat commented 3 years ago

I don't see that specific warning, unless I've missed something. Here's the full anaconda prompt traceback.

1-14-2020 Anaconda Prompt traceback.txt

jbohnslav commented 3 years ago

Hmm, it says number of finalized labels is zero: did you click "finalize labels" on the video you completed?

jesrackeat commented 3 years ago

I did, and it looks like there is a CSV of labels was successfully generated when I did.

jbohnslav commented 3 years ago

the labels will always be created whenever you save-- hitting "finalize labels" assertively turns all of your unlabeled frames into "background" ones. if you tried again, it should still give you an error because we require a minimum of two videos, but it should say that number of finalized labels is 1.

jesrackeat commented 3 years ago

I remember having difficulty using "finalize labels" when I first generated them--nothing happened when I asked the GUI to finalize the labels. Restarting my computer solved the problem. I don't think I can attach a CSV here, but this is what the first part of the labels CSV looks like. I don't see any cells with a value -1.

jesrackeat commented 3 years ago

I do have another question! When I click Video > Add or Open in the GUI, it looks like it can open only VideoReader files. Is it possible to use the GUI to generate labels if my videos have been exploded into folders of PNGs, as I did here?

jbohnslav commented 3 years ago

If there are no cells where the value is -1, then it worked. Maybe it is just because you only have one video for now.

Good question! I need to make this more clear in the documentation. I consider a directory full of pngs a "videoreader file". So if you just open the directory, it will load just like a video.

jesrackeat commented 3 years ago

This doesn't seem to be the case for me. With my project open, when I select Video > Add or open in the GUI and then navigate to the folder of PNGs I get an error.

jbohnslav commented 3 years ago

Hi, thanks for pointing this out-- this was a bug on my end. Unfortunately PySide doesn't let you open either files or directories. with commit b610d9a, you should be able to click on a .png file and it will automatically figure out that it's in a directory full of .pngs.

Can you pull the latest changes and try it out?

jesrackeat commented 3 years ago

It's working. I'm going to label a second video and try training the feature extractor again. Thanks for your help!

jesrackeat commented 3 years ago

No dice! Traceback attached here.

1-20-2020 traceback.txt

jbohnslav commented 3 years ago

Hmm, the same error- couldn't find more than one video. In the "run directory" made by deepethogram, there should be a file called "split.yaml". can you upload that?

jesrackeat commented 3 years ago

I'm back in the lab on Friday and will do so then!

jbohnslav commented 3 years ago

Great! Thanks for your help on this, sorry you're having such difficulty. Hopefully once things are up and running you have some good performing classifiers.

jesrackeat commented 3 years ago

split.txt

jbohnslav commented 3 years ago

Hmm, that looks fine to me! Are the two files you labeled both in the "train" set? As a hack, you can modify the "split.txt" yourself-- just move one of the labeled ones from the "train" to the "val"

jesrackeat commented 3 years ago

They were both in the validation set (EN0 histamine and EN3 beta alanine), so I moved one to the training set. Looks like it's training ok now.

I'm going to leave it for the weekend. I'll be back in lab next Thursday and let you know if there were any problems. Thanks!

jesrackeat commented 3 years ago

Still training!

jesrackeat commented 3 years ago

Training is complete and I tried "Feature Extractor - Infer" using the GUI, selecting all videos. I think I have another error with video loading here!

1-29-2020 traceback.txt

jbohnslav commented 3 years ago

what does this directory look like? Extracting from movie C:/Users/abrairalab/gui_logs/201104_130727_None/test_deepethogram\DATA\Trial 2 Nov 5 EN1 beta alanine\Trial 2 Nov 5 EN1 beta alanine?

jesrackeat commented 3 years ago

It's the folder with my PNGs.

jbohnslav commented 3 years ago

Should be fixed with 1500a58

jesrackeat commented 3 years ago

Looks like it's working, thank you for your continuing help!

jesrackeat commented 3 years ago

Predictions have been generated and at first glance the inferred labels looks pretty good for the two videos that I also labelled manually.

They aren't so quite accurate on videos where I did not manually label first. Do you think manually labelling another video and then training the model again might solve this problem?

jbohnslav commented 3 years ago

Great! Is the example image one that you already labeled manually?

In general, the models will do better the more data you have. You can use the "import predictions as labels" button to import the inaccurate predictions. Hopefully the easy frames will therefore be "labeled" for you, and you can go through and edit the labels by clicking on the label image.

jesrackeat commented 3 years ago

Example image is one I labeled manually. I'll try your suggestion with one of the videos where it wasn't quite so accurate and train again starting with the feature extractor (if that is the correct place to start with new labels on one video). Thank you!

jbohnslav commented 3 years ago

Great! Closing for now. If you have more bugs or performance issues, please open a new ticket.

jbohnslav / deepethogram

Problem loading images during Flow Generator training #31