Add AWS Lambda function to save uploaded video objects to DB

becky-gilbert commented 1 month ago

TL;DR

We need a AWS Lambda function that fires when a new object is created in the S3 bucket for lookit jsPsych video recordings, which makes an API call to save the video info to the Video table in the database.

Summary

We can pretty much re-use the existing Lambda functions for the RecordRTC buckets, but with different users/roles/credentials. As with our existing Lambda functions, we should also hook up the new ones to CloudWatch to get logs.

Also, in our current EFP RecordRTC-Lambda system, we are retrieving several pieces of info from the video filename (study ID, response ID, frame ID, consent vs not). So we will either need to make sure that we're structuring the lookit-jsPsych video filenames in the same way, or change the way that this info is retrieved in the new Lambdas so that it matches the way it's stored in the lookit-jsPsych filenames.

The Video table has a column for frame ID, which is also part of the file naming convention. I'm not sure what we will use for "frame ID", since lookit-jsPsych studies don't currently have the same researcher-defined unique IDs as in EFP. Perhaps a combination of the jsPsych plugin name and an incrementing value that reflects the trial's position in the experiment[^1]. Whatever we decide, it would be ideal if it were consistent with how we deal with the frame/trial IDs we use in the response sequence shown on the individual responses page (see https://github.com/lookit/lookit-api/issues/1316) and in the consent manager (see https://github.com/lookit/lookit-api/issues/1332).

[^1]: This is not as simple as looking at the position of a trial in the experiment timeline, because some of the trial objects in the main timeline are nested timelines that will repeat other trials an unknown number of times, and other trials in the main timeline can be conditionally skipped. jsPsych used to automatically generate and store an internal ID for each trial, which was an incrementing position marker and always unique, but this value has been removed from the jsPsych data in v8, so we should not rely on it. Instead, we may need to add our own IDs into the data, e.g. by setting up a global trial counter that always gets added to the trial data, or calculating this value based on the length of the jsPsych data array. These solutions could be implemented via the on_data_update or on_trial_finish callback parameters in our custom version of initJsPsych

mekline commented 1 month ago

Couple questions to consider:

why did the numbering feature come out of v8?
can we recycle relevant code from efp for naming/numbering conventions/patterns? Numbering should either work as similarly as possible to how it works in efp, OR ELSE, be very idiomatic for jspsych and clearly explainable.
Remind me: does or can a jspsych experiment be written out in a manner similar to the frames/sequence style of efp?

-for thinking thru edge cases: What kinds of mappings between a video and a plugin or trial are possible? Does one trial ever produce multiple videos? Vice versa? Is this the same constraints as efp or is anything changing?

becky-gilbert commented 1 month ago

@mekline great questions, thanks for helping us think through this. I've answered your questions below, and then summarized some possible solutions after that.

1. why did the numbering feature come out of v8?

This is from the jsPsych v8 migration docs:

We've removed internal_node_id and jsPsych.data.getDataByTimelineNode(). Timeline node IDs were used internally by jsPsych to keep track of experiment progress in version 7, but this is no longer the case in version 8. Most users didn't need or want to see the internal_node_id in the data, so we've removed it. If you relied on this parameter, the simplest replacement is probably to use the data parameter to add the information you need back to the timeline.

But one thing I forgot about is that the jsPsych data still does have a trial_index value, which is unique across trials. This basically just provides a trial count and so it wouldn't be quite as useful for researchers to see on the consent/response pages ( (response sequence: [ 0, 1, 2, ...]). Also, this tracks the trial number/position for that particular response, but doesn't link the trial back to the actual trial config object - this matters for experiments that have dynamic/conditional elements rather than static trial numbers/orders across all response.

So the trial_index value might be useful when combined with another piece of info, such as the trial's plugin type (1-html-keyboard-response) or a researcher-defined ID (1-my-trial).

2. can we recycle relevant code from efp for naming/numbering conventions/patterns? Numbering should either work as similarly as possible to how it works in efp, OR ELSE, be very idiomatic for jspsych and clearly explainable.

Maybe. EFP combines the frame's position within the sequence and the researcher-provided ID, e.g. 0-my-video-config, 1-my-video-consent, 2-first-instructions. In jsPsych we have the trial_index, but there are no researcher-provided IDs so we'd have to solve that problem first. Rather than using researcher-provided IDs, we could use the plugin name (which is not uniquely identifiable, but doesn't need to be since we have the unique trial_index too). That would result in a sequence like 0-video-config, 1-video-consent, 2-html-keyboard-response, 3-html-keyboard-response etc.

I don't think it makes sense to use the EFP convention/code for dealing with repeats (e.g. frameID-repeat-N) that was put in to handle frames that repeat because of browser navigation, and in jsPsych it's not possible to navigate across trials using the browser's navigation buttons.

3. does or can a jspsych experiment be written out in a manner similar to the frames/sequence style of efp?

Not really. The EFP config includes a mapping between the frame config and a name/ID. Whereas jsPsych only allows you to put the trial config objects directly into the sequence, which would be like removing the "frames" part of the EFP study protocol and putting all of the frame objects directly into EFP's "sequence". Here's some pseudocode to illustrate:

EFP:

{ 
  "frames": {
    "trial1": { ... trial 1 config ... },
    "trial2": { ... trial 2 config ... }
  },
  "sequence": ["trial1", "trial2"]
}

jsPsych:

let sequence = [ 
  { ... trial 1 config ... }, 
  { ... trial 2 config ... } 
];

In jsPsych, a researcher could use unique variable names to refer to trial objects and then use those variables to construct the jsPsych timeline (sequence), but jsPsych won't ever "know about" or store those variable names.

let trial1 = { ... trial 1 config ... };
let trial2 = { ... trial 2 config ... };
let sequence = [ trial1 , trial2 ];

What you CAN do in jsPsych is include an ID in the trial config/data itself. So we can either require that researchers include an ID, which would look like this:

let sequence = [ 
  { 
    data: { id: "trial1" }, 
    ... rest of trial 1 config ... 
  }, { 
    data: { id: "trial2" }, 
    ... rest of trial 2 config ... 
  } 
];

Or we can create our own IDs and add them in automatically. More on these two options below!

4. What kinds of mappings between a video and a plugin or trial are possible? Does one trial ever produce multiple videos? Vice versa? Is this the same constraints as efp or is anything changing?

The good news is that I don't think any of this will cause any problems for video file names! Once we have a "frameID" value for jsPsych, the video naming can work exactly the same as in EFP:

Trials/frames can record multiple videos: e.g. participants can re-record their consent video any number of times, and every recording is saved. This doesn't cause problems for naming the video file because the end of the video filename contains a timestamp and random 3-digit value. E.g. "videoStream<studyID>_<frameID>_<responseID>_timestamp1_123.webm", "videoStream<studyID>_<frameID>_<responseID>_timestamp2_456.webm", etc.
Videos can record over multiple trials/frames. In EFP the video naming convention changes to: "videoStream_<studyID>_multiframe-<frameID of the first frame>_<responseID>_timestamp_NNN.webm".

We can keep this same video naming convention with jsPsych, we just need to figure out the frame IDs.

Options for jsPsych trial IDs

1. Require researchers to add IDs in the trial configuration.

The experiment code would look like this:

let trial1 = {
    type: jsPsychHtmlKeyboardResponse,
    stimulus: 'Press any key to start.',
    data: { id: "start" }    // <----- RESEARCHER MUST INCLUDE THIS 
  }
... more trials ...
let sequence = [ trial1 , ... more trials ... ];

Data:

{
    "id": "start",  // <----- RESEARCHER'S ID
    "rt": 1287,
    "stimulus": "Press any key to start.",
    "response": "r",
    "trial_type": "html-keyboard-response",
    "trial_index": 1,
    "plugin_version": "2.0.0",
    "time_elapsed": 4005
}

And the sequence shown to researchers on the consent and response pages would look like this:

[ start, trial_1, trial_2, trial_3, end ]

If we don't require that these IDs are unique, then the sequence could contain repeated IDs, like this:

[ start, fixation, trial_1, fixation, trial_2, fixation, trial_3, end ]

Pros:

Researchers have control over trial names and can make them meaningful.
Disambiguates trials that use the same plugin.

Cons:

Researchers have to add an extra data parameter to their trial config objects when creating jsPsych experiments on CHS (which is still fully compatible with standard jsPsych, just not required)
We would have to add in a check to make sure that the researcher added IDs. (And maybe that they're unique?)
We would need some additional documentation explaining how to add trial IDs for "normal" trials and for nested timelines (i.e. repeating trial sequences), and/or pointing to jsPsych docs on this.

2. Add our own IDs in automatically.

Without getting an "id" value from the researcher, the info we have to work with is (1) the plugin name, and (2) the position in the experiment sequence.

So using the example above, the researcher would NOT need to provide the data parameter with an id value:

let trial1 = {
    type: jsPsychHtmlKeyboardResponse,
    stimulus: 'Press any key to start.'
  }
... etc ...
let sequence = [ trial1 , ... etc ... ];

And our custom jsPsych init function would contain a hook that automatically creates/inserts the ID, like this:

{
    "id": "1-html-keyboard-response",  // <----- CHS-GENERATED ID
    "rt": 1287,
    "stimulus": "Press any key to start.",
    "response": "r",
    "trial_type": "html-keyboard-response",
    "trial_index": 1,
    "plugin_version": "2.0.0",
    "time_elapsed": 4005
}

And the sequence shown to researchers on the consent and response pages would look like this:

[ 0-video-consent, 1-html-keyboard-response, 2-html-keyboard-response, 3-html-keyboard-response, 4-exit-survey ]

Pros:

Researchers do not need to add the trial IDs as an extra step when moving existing jsPsych code onto CHS.
We can ensure that all trials have IDs and that they are unique.

Cons:

More code for us to write/maintain/debug.
The ID sequence won't be as meaningful to researchers, especially when the experiment contains several different types of trials that use "generic" plugins (e.g. html-keyboard-response).

lookit / lookit-jspsych