ml4ai / tomcat

ToMCAT: Theory of Mind-based Cognitive Architecture for Teams
https://ml4ai.github.io/tomcat/
MIT License
21 stars 7 forks source link

Creation of a FACS sensor executable using the OpenFace library #113

Closed adarshp closed 4 years ago

adarshp commented 4 years ago

@runnanzhou - you might have done parts of this process already, but I wanted to document the task in detail for you, it might help. Apologies, I had meant to do this earlier, but am just getting around to it now.

We need to develop an executable that can process input webcam live video (or video files or images from disk) and output facial action units that are automatically detected using OpenFace in a JSON format.

Here are the implementation steps. To start with, we will implement the 'static' method, that requires just the frames and not a video on disk or in memory (the dynamic method is more accurate since it performs AU normalization, but is more complicated to implement).

At this point, you will have recreated the original functionality of the WebcamSensor class when it was integrated into the runExperiment executable. Once you have verified this, then modify src/WebcamSensor.cpp to get the action units by looking at https://github.com/TadasBaltrusaitis/OpenFace/blob/ad1b3cc45ca05c762b87356c18ad030fcf0f746e/exe/FeatureExtraction/FeatureExtraction.cpp#L190-L258, and adding the appropriate portions of that code to src/WebcamSensor.cpp to perform AU extraction.

runnanzhou commented 4 years ago

Hi Adarsh,

Thank you! I will do it.

Best wishes, Runnan Zhou

On Apr 19, 2020, at 11:53 AM, Adarsh Pyarelal notifications@github.com wrote:

External Email

@runnanzhou https://github.com/runnanzhou - you might have done parts of this process already, but I wanted to document the task in detail for you, it might help. Apologies, I had meant to do this earlier, but am just getting around to it now.

We need to develop an executable that can process input webcam live video (or video files or images from disk) and output facial action units that are automatically detected using OpenFace in a JSON format.

Here are the implementation steps. To start with, we will implement the 'static' method, that requires just the frames and not a video on disk or in memory (the dynamic method is more accurate since it performs AU normalization, but is more complicated to implement).

Update src/WebcamSensor.cpp and src/WebcamSensor.h to prefix the OpenFace included headers with OpenFace/ i.e. change the following line:

include "GazeEstimation.h"

to

include <OpenFace/GazeEstimation.h>

Do the same substitution for LandmarkCoreIncludes.h, SequenceCapture.h, VisualizationUtils.h, and Visualizer.h in src/WebcamSensor*.cpp.

Create a file called src/AUSensor.cpp, create an int main(...) function in it.

Add a line at the top of src/AUSensor.cpp to include the WebcamSensor.h header file. Inside the main function, add a line that creates an object that is an instance of the WebcamSensor class.

Modify the arguments attribute in WebcamSensor.h, adding the string -au_static to the vector.

Create a while loop in the main function that keeps calling the get_observation method of the WebcamSensor class, until the SIGTERM signal is received, at which point the loop should exit.

Add the line add_subdirectory(external/OpenFace) at some point before the call to add_subdirectory(src) in tomcat/CMakeLists.txt

Add a line called add_executable(ausensor AUSensor.cpp) in src/CMakeLists.txt.

Immediately after that line, add the line

target_link_libraries(ausensor PUBLIC OpenFace) You might need to add the line

target_include_directories(ausensor ${openface_include_dirs}) as well.

Navigate to the build/ directory in the tomcat root directory and execute

cmake .. make -j ausensor Test the functionality of the executable by running ./bin/ausensor (from within the build directory). You should see face landmarks being tracked, as well as the gaze.

At this point, you will have recreated the original functionality of the WebcamSensor class when it was integrated into the runExperiment executable. Once you have verified this, then modify src/WebcamSensor.cpp to get the action units by looking at https://github.com/TadasBaltrusaitis/OpenFace/blob/ad1b3cc45ca05c762b87356c18ad030fcf0f746e/exe/FeatureExtraction/FeatureExtraction.cpp#L190-L258 https://github.com/TadasBaltrusaitis/OpenFace/blob/ad1b3cc45ca05c762b87356c18ad030fcf0f746e/exe/FeatureExtraction/FeatureExtraction.cpp#L190-L258 and adding the appropriate portions of that code to src/WebcamSensor.cpp to perform AU extraction.

Once you have a minimal setup going, then modify the program to output the AUs to the standard output in JSON format. You can use the nlohmann-json library for this (see src/Mission.cpp for an example of usage). The relevant attributes of the RecorderOpenFace class to work with are au_intensities and au_occurrences: https://github.com/ml4ai/tomcat/blob/08115c44d42ad552a2fac08d1f566a785a778c66/external/OpenFace/lib/Utilities/include/RecorderOpenFace.h#L189-L190 https://github.com/ml4ai/tomcat/blob/08115c44d42ad552a2fac08d1f566a785a778c66/external/OpenFace/lib/Utilities/include/RecorderOpenFace.h#L189-L190. Start with a simple JSON format, then we can see how to massage it into the form that the TA3 testbed expects. — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ml4ai/tomcat/issues/113, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJFHXEUBPUST64ZPESO7IVTRNNCBJANCNFSM4ML452DA.

adarshp commented 4 years ago

Action items based on discussion with ASU team

@shreeyajain Based on our conversation with the ASU folks today, I think these are the action items we have.

Output format updates

In order to incorporate new information and make the output format more compatible with the TA3 testbed, we should change the output format. I propose the format shown in the example below.

{
    "header": {
        "timestamp": "2019-12-26T12:47:23.1234Z",
        "message_type": "observation",
        "version": "0.1"
    },
    "msg": { 
        "experiment_id": "563e4567-e89b-12d3-a456-426655440000",
        "trial_id": "123e4567-e89b-12d3-a456-426655440000",
        "timestamp": "2019-12-26T14:05:02.1412Z",
        "source": "ofsensor",
        "sub_type": "state",
        "version": "0.1"
    },
    "data": {
        "playername": "Aptiminer1", 
        "landmark_detection_confidence": 0.94848,
        "landmark_detection_success": true,
        "frame": 1,
        "action_units": {
            "AU04": {
                "intensity": 0.7257351100546178,
                "occurrence" : 1.0
            },
            "AU05": {
                "intensity": 0.35402671613589914,
                "occurrence" : 0.0
            },
            ...
        },
        "gaze": {
            "eye_0" : {
                "x": 0.10917,
                "y": 0.147619,
                "z": -0.983001
            },
            "eye_1": {
                "x": -0.166114,
                "y": 0.136956,
                "z": -0.97655
            },
            "gaze_angle": {
                "x": ...,
                "y": ...
            } 
        }
    }
}

Notes:

Command line options

adarshp commented 4 years ago

From Federico: It might be useful to have eye_lmk and pose data as well to figure out which quadrant of the screen participants are looking at.

adarshp commented 4 years ago

@shreeyajain - another thing that just occurred to me - we'll need to add a command line option -f, that will enable reading and processing frames from a file instead of a webcam, since the tool will not be run during the experimental trial, but rather will be used on postprocessed video data (i.e. the cropped recording of the Zoom session)

shreeyajain commented 4 years ago

@shreeyajain - another thing that just occurred to me - we'll need to add a command line option -f, that will enable reading and processing frames from a file instead of a webcam, since the tool will not be run during the experimental trial, but rather will be used on postprocessed video data (i.e. the cropped recording of the Zoom session)

@adarshp - I looked into the documentation for OpenFace and I believe we can just give the filename as command line argument with the -f option. I will have to include an outer while loop which can help detect these arguments, and default to webcam in their absence.

adarshp commented 4 years ago

You don't need to include a while loop - you can just use the Boost program options library to implement the command line option parsing (with defaults).

shreeyajain commented 4 years ago

Renaming the executable The executable has been renamed from ausensor to facesensor.

You can now navigate to the build/ directory and execute:

$ cmake ..
$ make -j facesensor
$ ./bin/facesensor
shreeyajain commented 4 years ago

From Federico: It might be useful to have eye_lmk and pose data as well to figure out which quadrant of the screen participants are looking at.

I have updated the output format to incorporate eye_lmk and pose:

{
    "header": {
        "timestamp": "2020-07-16T05:06:56.965755Z",
        "message_type": "observation",
        "version": "0.1"
    },
    "msg": {
        "experiment_id": "563e4567-e89b-12d3-a456-426655440000",
        "trial_id": "123e4567-e89b-12d3-a456-426655440000",
        "timestamp": "2020-07-16T05:06:56.965755Z",
        "source": "facesensor",
        "sub_type": "state",
        "version": "0.1"
    },
    "data": {
        "playername": "shreeya08",
        "landmark_detection_confidence": "0.97500",
        "landmark_detection_success": true,
        "frame": 8,
        "action_units": {
            "AU01": {
                "occurrence": 0.0,
                "intensity": 0.4174251010327605
            },
            "AU02": {
                "occurrence": 0.0,
                "intensity": 0.06606532441180364
            },
            ...
        },
        "gaze": {
            "eye_0": {
                "x": -0.042032960802316666,
                "y": -0.037290651351213455,
                "z": -0.9984200596809387
            },
            "eye_1": {
                "x": -0.28871601819992065,
                "y": 0.045460283756256104,
                "z": -0.9563348889350891
            },
            "gaze_angle": {
                "x": -0.16761472821235657,
                "y": 0.0041793398559093475
            },
            "eye_lmk2d": {
                "x_0": 289.2476806640625,
                "x_1": 291.44573974609375,
                ...
                "x_55": 374.8895263671875,
                "y_0": 392.86376953125,
                "y_1": 386.2491760253906,
                ...
                "y_55": 388.72772216796875
            },
            "eye_lmk3d": {
                "X_0": -20.11672019958496,
                "X_1": -18.630495071411133,
                ...
                "X_55": 35.71424102783203,
                "Y_0": 99.99628448486328,
                "Y_1": 95.42163848876953,
                ...
                "Y_55": 96.77069091796875,
                "Z_0": 327.07647705078125,
                "Z_1": 326.22967529296875,
                ...
                "Z_55": 325.328369140625
            }
        },
        "pose": {
            "Tx": 18.100841522216797,
            "Ty": 156.148193359375,
            "Tz": 388.6546630859375,
            "Rx": -0.09204348921775818,
            "Ry": 0.10995744913816452,
            "Rz": -0.068435437977314
        }
    }
}

@adarshp Do you have any suggested changes or is this okay? Additionally, should I set the precision (refer to https://github.com/TadasBaltrusaitis/OpenFace/blob/658a6a1cc2028f034c8f29233a01ddc3f9fd6672/lib/local/Utilities/src/RecorderCSV.cpp)?

shreeyajain commented 4 years ago

Command line options added Allowed options:

  -h [ --help ]             produce help message
  --exp_id arg (=null)      set experiment ID
  --trial_id arg (=null)    set trial ID
  --playername arg (=null)  set playername
  --mloc arg                set OpenFace models directory
  --indent arg (=0)         set indentation (true/false)
  -f [ --file ] arg (=null) specify the input video file
adarshp commented 4 years ago

@adarshp Do you have any suggested changes or is this okay? Additionally, should I set the precision (refer to https://github.com/TadasBaltrusaitis/OpenFace/blob/658a6a1cc2028f034c8f29233a01ddc3f9fd6672/lib/local/Utilities/src/RecorderCSV.cpp)?

This looks great! Don't worry about the precision - people can reduce the precision downstream if they want.

adarshp commented 4 years ago

I just realized, if we are going to try piping the output of facesensor into mosquitto_pub, we'll need to make each message a single line (i.e. calling dump() instead of dump(4) in WebcamSensor.cpp). However, it is nice to be able to get indented output for debugging. Can you also add a boolean command line flag --indent (that defaults to false) that controls whether the output is indented or not?

shreeyajain commented 4 years ago

I just realized, if we are going to try piping the output of facesensor into mosquitto_pub, we'll need to make each message a single line (i.e. calling dump() instead of dump(4) in WebcamSensor.cpp). However, it is nice to be able to get indented output for debugging. Can you also add a boolean command line flag --indent (that defaults to false) that controls whether the output is indented or not?

Sure, I'll do that!

shreeyajain commented 4 years ago

@shreeyajain - another thing that just occurred to me - we'll need to add a command line option -f, that will enable reading and processing frames from a file instead of a webcam, since the tool will not be run during the experimental trial, but rather will be used on postprocessed video data (i.e. the cropped recording of the Zoom session)

When we process frames from a file, we might not need visualization. Should we make visualization optional as well?

adarshp commented 4 years ago

Yes, we should make it optional.

adarshp commented 4 years ago

Closed by #187 .