Closed Amakri1020 closed 5 years ago
Hi @Amakri1020 and thank you for interest.
Yes, in order to retrain the network, you should unroll all the .avi sequences (video_garmin.avi and video_saliency.avi) into frames. The code assumes the following structure:
Thanks for the quick response, that is helpful!
I am also curious if there were instances where the driver looked at something out of the FoV of the camera and if so, how did you deal with these cases?
That's an interesting question :)
It is likely that during the recording a driver took quick peeks outside the FoV (e.g, looking at side mirrors). Anyway, the effect of rapid shifts in attention is ameliorated by the fixation map construction procedure. Indeed, as mentioned in the journal paper, such a procedure involves a temporal aggregation of fixation points to build a single fixation maps.
Short answer: we don't deal with such cases. I don't think these situations are encoded in fixation maps in the first place. You could still get them by looking at the ETG videos and the raw fixation recordings. But I'm not sure.
D
Hi, I'm trying to train this model from scratch using the dataset provided, however it seems the dataset provided doesn't quite match what the code requires, e.g. it has avi files instead of frame jpegs so when I try to run:
python2 train.py --which_branch image,
I end up with an error like:
ValueError: Provided path "/home/amakri/DREYEVE_DATA/23/frames/004465.jpg" does NOT exist.
Is there code somewhere in this repo that I've missed which does this sort of preprocessing and sets up the dataset to be run by the code? Or are these things I will just have to do myself?