ml4ai / tomcat

ToMCAT: Theory of Mind-based Cognitive Architecture for Teams

https://ml4ai.github.io/tomcat/

MIT License

21 stars 7 forks source link

Creation of a FACS sensor executable using the OpenFace library #113

Closed adarshp closed 4 years ago

adarshp commented 4 years ago

@runnanzhou - you might have done parts of this process already, but I wanted to document the task in detail for you, it might help. Apologies, I had meant to do this earlier, but am just getting around to it now.

We need to develop an executable that can process input webcam live video (or video files or images from disk) and output facial action units that are automatically detected using OpenFace in a JSON format.

Here are the implementation steps. To start with, we will implement the 'static' method, that requires just the frames and not a video on disk or in memory (the dynamic method is more accurate since it performs AU normalization, but is more complicated to implement).

[x] Update src/WebcamSensor.cpp and src/WebcamSensor.h to prefix the OpenFace included headers with OpenFace/ i.e. change the following line:
```
 #include "GazeEstimation.h"
```
to
```
 #include <OpenFace/GazeEstimation.h>
```
[x] Do the same substitution for LandmarkCoreIncludes.h, SequenceCapture.h, VisualizationUtils.h, and Visualizer.h in src/WebcamSensor*.cpp.
[x] Create a file called src/AUSensor.cpp, create an int main(...) function in it.
[x] Add a line at the top of src/AUSensor.cpp to include the WebcamSensor.h header file. Inside the main function, add a line that creates an object that is an instance of the WebcamSensor class.
[x] Modify the arguments attribute in WebcamSensor.h, adding the string -au_static to the vector.
[x] Create a while loop in the main function that keeps calling the get_observation method of the WebcamSensor class, until the SIGTERM signal is received, at which point the loop should exit.
[x] Add the line add_subdirectory(external/OpenFace) at some point before the call to add_subdirectory(src) in tomcat/CMakeLists.txt
[x] Add a line called add_executable(ausensor AUSensor.cpp) in src/CMakeLists.txt.

[x] Immediately after that line, add the line

target_link_libraries(ausensor PUBLIC OpenFace)

[x] You might need to add the line

 target_include_directories(ausensor ${openface_include_dirs})

as well.

[x] Navigate to the build/ directory in the tomcat root directory and execute
```
cmake ..
make -j ausensor
```
[x] Test the functionality of the executable by running ./bin/ausensor (from within the build directory). You should see face landmarks being tracked, as well as the gaze.

At this point, you will have recreated the original functionality of the WebcamSensor class when it was integrated into the runExperiment executable. Once you have verified this, then modify src/WebcamSensor.cpp to get the action units by looking at https://github.com/TadasBaltrusaitis/OpenFace/blob/ad1b3cc45ca05c762b87356c18ad030fcf0f746e/exe/FeatureExtraction/FeatureExtraction.cpp#L190-L258, and adding the appropriate portions of that code to src/WebcamSensor.cpp to perform AU extraction.

[x] Once you have a minimal setup going, then modify the program to output the AUs to the standard output in JSON format. You can use the nlohmann-json library for this (see src/Mission.cpp for an example of usage). The relevant attributes of the RecorderOpenFace class to work with are au_intensities and au_occurrences: https://github.com/ml4ai/tomcat/blob/08115c44d42ad552a2fac08d1f566a785a778c66/external/OpenFace/lib/Utilities/include/RecorderOpenFace.h#L189-L190. Start with a simple JSON format, then we can see how to massage it into the form that the TA3 testbed expects.

runnanzhou commented 4 years ago

Hi Adarsh,

Thank you! I will do it.

Best wishes, Runnan Zhou

On Apr 19, 2020, at 11:53 AM, Adarsh Pyarelal notifications@github.com wrote:

External Email

@runnanzhou https://github.com/runnanzhou - you might have done parts of this process already, but I wanted to document the task in detail for you, it might help. Apologies, I had meant to do this earlier, but am just getting around to it now.

We need to develop an executable that can process input webcam live video (or video files or images from disk) and output facial action units that are automatically detected using OpenFace in a JSON format.

Here are the implementation steps. To start with, we will implement the 'static' method, that requires just the frames and not a video on disk or in memory (the dynamic method is more accurate since it performs AU normalization, but is more complicated to implement).

Update src/WebcamSensor.cpp and src/WebcamSensor.h to prefix the OpenFace included headers with OpenFace/ i.e. change the following line:

include "GazeEstimation.h"

to

include <OpenFace/GazeEstimation.h>

Do the same substitution for LandmarkCoreIncludes.h, SequenceCapture.h, VisualizationUtils.h, and Visualizer.h in src/WebcamSensor*.cpp.

Create a file called src/AUSensor.cpp, create an int main(...) function in it.

Add a line at the top of src/AUSensor.cpp to include the WebcamSensor.h header file. Inside the main function, add a line that creates an object that is an instance of the WebcamSensor class.

Modify the arguments attribute in WebcamSensor.h, adding the string -au_static to the vector.

Create a while loop in the main function that keeps calling the get_observation method of the WebcamSensor class, until the SIGTERM signal is received, at which point the loop should exit.

Add the line add_subdirectory(external/OpenFace) at some point before the call to add_subdirectory(src) in tomcat/CMakeLists.txt

Add a line called add_executable(ausensor AUSensor.cpp) in src/CMakeLists.txt.

Immediately after that line, add the line

target_link_libraries(ausensor PUBLIC OpenFace) You might need to add the line

target_include_directories(ausensor ${openface_include_dirs}) as well.

Navigate to the build/ directory in the tomcat root directory and execute

cmake .. make -j ausensor Test the functionality of the executable by running ./bin/ausensor (from within the build directory). You should see face landmarks being tracked, as well as the gaze.

At this point, you will have recreated the original functionality of the WebcamSensor class when it was integrated into the runExperiment executable. Once you have verified this, then modify src/WebcamSensor.cpp to get the action units by looking at https://github.com/TadasBaltrusaitis/OpenFace/blob/ad1b3cc45ca05c762b87356c18ad030fcf0f746e/exe/FeatureExtraction/FeatureExtraction.cpp#L190-L258 https://github.com/TadasBaltrusaitis/OpenFace/blob/ad1b3cc45ca05c762b87356c18ad030fcf0f746e/exe/FeatureExtraction/FeatureExtraction.cpp#L190-L258 and adding the appropriate portions of that code to src/WebcamSensor.cpp to perform AU extraction.

Once you have a minimal setup going, then modify the program to output the AUs to the standard output in JSON format. You can use the nlohmann-json library for this (see src/Mission.cpp for an example of usage). The relevant attributes of the RecorderOpenFace class to work with are au_intensities and au_occurrences: https://github.com/ml4ai/tomcat/blob/08115c44d42ad552a2fac08d1f566a785a778c66/external/OpenFace/lib/Utilities/include/RecorderOpenFace.h#L189-L190 https://github.com/ml4ai/tomcat/blob/08115c44d42ad552a2fac08d1f566a785a778c66/external/OpenFace/lib/Utilities/include/RecorderOpenFace.h#L189-L190. Start with a simple JSON format, then we can see how to massage it into the form that the TA3 testbed expects. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ml4ai/tomcat/issues/113, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJFHXEUBPUST64ZPESO7IVTRNNCBJANCNFSM4ML452DA.

adarshp commented 4 years ago

Action items based on discussion with ASU team

@shreeyajain Based on our conversation with the ASU folks today, I think these are the action items we have.

[x] Renaming the executable: Since folks are interested in information besides just the action units, please rename ausensor (and by extension, AUSensor.cpp to something more generic. Some suggestions: ofsensor, openface_sensor, face_sensor (but feel free to use another name if you can think of something catchy!)
[ ] Windows compilation: Try compiling and running the executable on Windows (and if you can get it to work, create the file src/cpp/webcam/README.md and add the instructions to compile and run it on Windows).

Output format updates

In order to incorporate new information and make the output format more compatible with the TA3 testbed, we should change the output format. I propose the format shown in the example below.

{
    "header": {
        "timestamp": "2019-12-26T12:47:23.1234Z",
        "message_type": "observation",
        "version": "0.1"
    },
    "msg": { 
        "experiment_id": "563e4567-e89b-12d3-a456-426655440000",
        "trial_id": "123e4567-e89b-12d3-a456-426655440000",
        "timestamp": "2019-12-26T14:05:02.1412Z",
        "source": "ofsensor",
        "sub_type": "state",
        "version": "0.1"
    },
    "data": {
        "playername": "Aptiminer1", 
        "landmark_detection_confidence": 0.94848,
        "landmark_detection_success": true,
        "frame": 1,
        "action_units": {
            "AU04": {
                "intensity": 0.7257351100546178,
                "occurrence" : 1.0
            },
            "AU05": {
                "intensity": 0.35402671613589914,
                "occurrence" : 0.0
            },
            ...
        },
        "gaze": {
            "eye_0" : {
                "x": 0.10917,
                "y": 0.147619,
                "z": -0.983001
            },
            "eye_1": {
                "x": -0.166114,
                "y": 0.136956,
                "z": -0.97655
            },
            "gaze_angle": {
                "x": ...,
                "y": ...
            } 
        }
    }
}

Notes:

experiment_id, trial_id, and playername should be set to null unless they are specified with command line options (see the next section).
ofsensor is a placeholder - if you rename the executable something else, that would go in this field.
The timestamps should be ISO 8601 format. There are a few ways to do this in C++ - the microsec_clock::universal_time() function from the Boost.Date_Time library might be the way to go.
For now, the timestamp in the header object can be the same as the timestamp in the msg object. We may need to change this later.
The description of landmark_detection_confidence and landmark_detection_success can be found here: https://github.com/TadasBaltrusaitis/OpenFace/wiki/Output-Format (in their output format, the corresponding names are confidence and success)
See tomcat/external/OpenFace/lib/Utilities/src/RecorderCSV.cpp for hints on how to extract the landmark_detection_success, landmark_detection_confidence and gaze_angle numbers.

Command line options

[x] Reducing dependency on environment variables: Let's try to reduce our dependence on environment variables. Use Boost's program options library to add a command line option --mloc to the executable that specifies the location of the models. There should also be some fallback options if the user doesn't set the --mloc flag: the program should first check the OPENFACE_MODELS_DIR environment variable to see if it is non-empty, and if so, use that as the model directory. If neither the environment variable nor the mloc flags are set, the program should throw a runtime exception, telling the user to either use the mloc flag or the environment variable to point the program to the directory containing the models.
Experiment ID and trial ID
- [x] --exp_id: If this flag is set, then the experiment key in the JSON output will be set to the value provided.
- [x] --trial_id: If this flag is set, then the trial key in the JSON output will be set to the value provided.
- [x] --playername: If this flag is set, then the trial key in the JSON output will be set to the value provided.

adarshp commented 4 years ago

From Federico: It might be useful to have eye_lmk and pose data as well to figure out which quadrant of the screen participants are looking at.

adarshp commented 4 years ago

@shreeyajain - another thing that just occurred to me - we'll need to add a command line option -f, that will enable reading and processing frames from a file instead of a webcam, since the tool will not be run during the experimental trial, but rather will be used on postprocessed video data (i.e. the cropped recording of the Zoom session)

shreeyajain commented 4 years ago

@shreeyajain - another thing that just occurred to me - we'll need to add a command line option -f, that will enable reading and processing frames from a file instead of a webcam, since the tool will not be run during the experimental trial, but rather will be used on postprocessed video data (i.e. the cropped recording of the Zoom session)

@adarshp - I looked into the documentation for OpenFace and I believe we can just give the filename as command line argument with the -f option. I will have to include an outer while loop which can help detect these arguments, and default to webcam in their absence.

adarshp commented 4 years ago

You don't need to include a while loop - you can just use the Boost program options library to implement the command line option parsing (with defaults).

shreeyajain commented 4 years ago

Renaming the executable The executable has been renamed from ausensor to facesensor.

You can now navigate to the build/ directory and execute:

$ cmake ..
$ make -j facesensor
$ ./bin/facesensor

shreeyajain commented 4 years ago

From Federico: It might be useful to have eye_lmk and pose data as well to figure out which quadrant of the screen participants are looking at.

I have updated the output format to incorporate eye_lmk and pose:

{
    "header": {
        "timestamp": "2020-07-16T05:06:56.965755Z",
        "message_type": "observation",
        "version": "0.1"
    },
    "msg": {
        "experiment_id": "563e4567-e89b-12d3-a456-426655440000",
        "trial_id": "123e4567-e89b-12d3-a456-426655440000",
        "timestamp": "2020-07-16T05:06:56.965755Z",
        "source": "facesensor",
        "sub_type": "state",
        "version": "0.1"
    },
    "data": {
        "playername": "shreeya08",
        "landmark_detection_confidence": "0.97500",
        "landmark_detection_success": true,
        "frame": 8,
        "action_units": {
            "AU01": {
                "occurrence": 0.0,
                "intensity": 0.4174251010327605
            },
            "AU02": {
                "occurrence": 0.0,
                "intensity": 0.06606532441180364
            },
            ...
        },
        "gaze": {
            "eye_0": {
                "x": -0.042032960802316666,
                "y": -0.037290651351213455,
                "z": -0.9984200596809387
            },
            "eye_1": {
                "x": -0.28871601819992065,
                "y": 0.045460283756256104,
                "z": -0.9563348889350891
            },
            "gaze_angle": {
                "x": -0.16761472821235657,
                "y": 0.0041793398559093475
            },
            "eye_lmk2d": {
                "x_0": 289.2476806640625,
                "x_1": 291.44573974609375,
                ...
                "x_55": 374.8895263671875,
                "y_0": 392.86376953125,
                "y_1": 386.2491760253906,
                ...
                "y_55": 388.72772216796875
            },
            "eye_lmk3d": {
                "X_0": -20.11672019958496,
                "X_1": -18.630495071411133,
                ...
                "X_55": 35.71424102783203,
                "Y_0": 99.99628448486328,
                "Y_1": 95.42163848876953,
                ...
                "Y_55": 96.77069091796875,
                "Z_0": 327.07647705078125,
                "Z_1": 326.22967529296875,
                ...
                "Z_55": 325.328369140625
            }
        },
        "pose": {
            "Tx": 18.100841522216797,
            "Ty": 156.148193359375,
            "Tz": 388.6546630859375,
            "Rx": -0.09204348921775818,
            "Ry": 0.10995744913816452,
            "Rz": -0.068435437977314
        }
    }
}

@adarshp Do you have any suggested changes or is this okay? Additionally, should I set the precision (refer to https://github.com/TadasBaltrusaitis/OpenFace/blob/658a6a1cc2028f034c8f29233a01ddc3f9fd6672/lib/local/Utilities/src/RecorderCSV.cpp)?

shreeyajain commented 4 years ago

Command line options added Allowed options:

  -h [ --help ]             produce help message
  --exp_id arg (=null)      set experiment ID
  --trial_id arg (=null)    set trial ID
  --playername arg (=null)  set playername
  --mloc arg                set OpenFace models directory
  --indent arg (=0)         set indentation (true/false)
  -f [ --file ] arg (=null) specify the input video file

adarshp commented 4 years ago

@adarshp Do you have any suggested changes or is this okay? Additionally, should I set the precision (refer to https://github.com/TadasBaltrusaitis/OpenFace/blob/658a6a1cc2028f034c8f29233a01ddc3f9fd6672/lib/local/Utilities/src/RecorderCSV.cpp)?

This looks great! Don't worry about the precision - people can reduce the precision downstream if they want.

adarshp commented 4 years ago

I just realized, if we are going to try piping the output of facesensor into mosquitto_pub, we'll need to make each message a single line (i.e. calling dump() instead of dump(4) in WebcamSensor.cpp). However, it is nice to be able to get indented output for debugging. Can you also add a boolean command line flag --indent (that defaults to false) that controls whether the output is indented or not?

shreeyajain commented 4 years ago

I just realized, if we are going to try piping the output of facesensor into mosquitto_pub, we'll need to make each message a single line (i.e. calling dump() instead of dump(4) in WebcamSensor.cpp). However, it is nice to be able to get indented output for debugging. Can you also add a boolean command line flag --indent (that defaults to false) that controls whether the output is indented or not?

Sure, I'll do that!

shreeyajain commented 4 years ago

@shreeyajain - another thing that just occurred to me - we'll need to add a command line option -f, that will enable reading and processing frames from a file instead of a webcam, since the tool will not be run during the experimental trial, but rather will be used on postprocessed video data (i.e. the cropped recording of the Zoom session)

When we process frames from a file, we might not need visualization. Should we make visualization optional as well?

adarshp commented 4 years ago

Yes, we should make it optional.

adarshp commented 4 years ago

Closed by #187 .