Whack-A-Mole VR Dashboard: Data capture experiment

bastianilso commented 1 year ago

Once the data capture system is ready in its first version (after solving issue #5), we should make a data collection test with it. We should test the data capture system to verify that we can collect data from a real dashboard.

Steps:

Embed the JS data capture system with the Whack-A-Mole Dashboard at https://github.com/med-material/Whack_A_Mole_RShiny
Run experimental procedure described below with 1-2 persons.
Import collected data into R and analyze it, based on the analysis goals described below.

Experimental Procedure:

Place a laptop on a table, with the Whack-A-Mole Dashboard + a screen recorder (I recommend to use OBS Screen recorder)
Get a person to sit in front of the laptop and ask the person to solve 3 tasks. For each task, start the screen recording and the data capture system, and stop it again.
The 3 tasks to solve are: 1) Upload data to the dashboard, 2) Find out whether Game Performance is worse on the left side than the right side and 3) Find out at what point in the game, the person missed moles.

Analysis Goals: We will use the data to:

Identify which task took the longest time to solve.
Count how many clicks it took to solve each task.
Measure how much distance, the person's mouse had to move in each task.
Measure how much the person had to scroll.
Visualize the trajectory of the person's mouse movements.

bastianilso commented 1 year ago

Data to upload (upload in the order: Meta, Event, Sample) https://github.com/med-material/d3-rshiny-vis/files/9958725/log2021-07-01.12-50-01.9658Sample.zip

bilal-62210 commented 1 year ago

@bastianilso i've send on your email address the test result because the folder is too big to be send here

bastianilso commented 1 year ago

OK @bilal-62210, now focus on the analysis goals and help provide answer to the following questions. I suggest you try to do it in a new project in R.

Identify which task took the longest time to solve.
Count how many clicks it took to solve each task.
Measure how much distance, the person's mouse had to move in each task.
Measure how much the person had to scroll.
Visualize the trajectory of the person's mouse movements.

bastianilso commented 1 year ago

@bilal-62210 i noticed in aldryck's "step1" data, it says the duration is "87065" (seconds?) in the meta file

that's a either a very long time, or maybe a count of milliseconds (?) I think it would be better to write "87.065" seconds then. (this corresponds to the ss.fff format we know from timestamps hh:mm:ss.fff

bastianilso commented 1 year ago

@bilal-62210 another suggestion, in your analysis project, after you import the CSV files you created with LoggingLoader and save the data as a single RDA file which take less space and can be uploaded to a Github repo (since the data wont really change).

You can see how saving/loading RDA files in your R project works here: https://github.com/bastianilso/bci-pam-analysis/blob/main/pam_study_preprocess.R#L16

in general, feel free to take inspiration from the general structure of analysis from here: https://github.com/bastianilso/bci-pam-analysis

we use a notion of having a "preprocess" file and an "analysis" file - in your case, an "analyis" file is probably enough for now. preprocessing is mainly used in case the data needs to be cleaned (formatting timestamps, formatting numbers..)

bilal-62210 commented 1 year ago

@bilal-62210j'ai remarqué dans les données "step1" d'aldryck, il est indiqué que la durée est de "87065" (secondes ?) dans le méta-fichier

c'est soit très longtemps, soit peut-être un nombre de millisecondes (?) Je pense qu'il serait préférable d'écrire "87,065" secondes alors. (cela correspond au ss.fffformat que nous connaissons des horodatageshh:mm:ss.fff

Normaly there is a point in the value like this:

bilal-62210 commented 1 year ago

hi @bastianilso can you try to open this meta.csv file please. I did some changes and now when i open this file on onenote or something else the duration format is good and i have the point between ss.fff So give me your feedback when you have open it please, to check if you have the point too. log2022-12-9 14-8-55.382Meta.csv

bastianilso commented 1 year ago

@bilal-62210 yes, this solved it, i have the point too.

bilal-62210 commented 1 year ago

@bastianilso this is what i have for the duration of a task :

So for this analysis goal : Identify which task took the longest time to solve, we just have to compare the duration of each task to know wich task took the longest time to solve.

bastianilso commented 1 year ago

@bilal-62210 Thanks. Here is some feedback on your process:

You are writing code directly in the R terminal, I think it's better to use an empty R project and write code in there, so you can easily update it.
You are loading data from an absolute path, this should not be necessary. You can ask andy/aldryck/pierre about help for that if need be. Your r project should contain a subfolder called "data" and thus you can load the data by specifying LoadFromDirectory("data/"). Make sure to pass the parameter Sample="ContinuousMeasurement" since by default we assume sample files are called "sample" and in your case they are called continuous measurements.
Your experiment collected data from 2 people (Aldryck/Xavier) in 3 tasks. Therefore I would expect 2x3=6 indications of duration (one duration for each task for each participant), sorted from lowest to highest. Right now the Duration column only contains one number (5.167) and its not clear which person or which task the number belongs to. What data is inside the data folder?
Judging from the terminal output, the Duration column currently holds data in "chr" format (character format). You may want to format that as "num" format (numerical) (look this up online).
Whether performing a "Distinct()" operation on D is the right choice depends what you are trying to do. Distinct removes all non-unique numbers. But what you actually might want to do is to use group_by() and group by the columns which indicate which participant and test this data is from. in addition, you may want to use a summarize() function to summarize the duration by each. There are youtube videos demonstrating how to do this with dplyr online.
The comparison of duration should ideally not be manual. If you were to code a dashboard like whack-a-mole, you would want a programmatic way to identify which duration was the shortest.

bilal-62210 commented 1 year ago

hi @bastianilso, i'm working on the analaysis goals and i did a wrong manipulation, i've deleted the folder with xavier's and aldryck's test. Do you still have this folder ? if yes please send it to me. If you don't have it i will do some new tests. Thanks

bastianilso commented 1 year ago

Hi @bilal-62210,

I have it on my work laptop. Better do some new tests. Use 2 other participants so we can combine the data with the data from xavier and aldryck when i come back. -B

bilal-62210 commented 1 year ago

@bastianilso ok I will do it with Pierre and Lucas.

bilal-62210 commented 1 year ago

@bastianilso i learned a lot about dplyr the last 2 days, this is my result for the duration of a task : Give me your feedback about that

bastianilso commented 1 year ago

nice @bilal-62210, yes these results allow us to compare the duration and solve the first analysis goal. Here are some notes from me about your solution:

1) your dataset 'D' contains Duration.x and Duration.y because the Event CSV and Meta CSV each contain a column called Duration. To fix this, I suggest we rename the Meta CSV duration column to e.g ."SessionDuration" and the Event duration to e.g. "DurationSinceStart". This also makes it more clear what the columns contain than the word Duration itself. I will create a separate issue about that.

2) As a preliminary step to your analysis you can consider renaming the columns "i2" and "i3" to e.g. "Participant" and "Task". you can use dplyr's rename() function to do this.

3) Rather than storing the results as intermediate variables you can consider concatenating line 6-10 like this (assuming the variables are renamed as per suggestions above):

DurationSummary = D %>% group_by(Participant, Task) %>%
  mutate(SessionDuration = as.numeric(SessionDuration)) %>%
  summarize(MeanDuration = mean(SessionDuration)) %>%
  arrange(desc(MeanDuration))

If you are in doubt how to see intermediate results, you can apply %>% view() at any stage of the dplyr pipeline and R studio will show you the dataset in its state at that point. For example D %>% group_by(Participant, Task) %>% mutate(SessionDuration = as.numeric(SessionDuration)) %>% view() will view the dataset after R has performed the group_by and mutate operation on D.

bilal-62210 commented 1 year ago

@bastianilso Ok i will work on it and make the changes. I've answered to your first point in the issue #16.

bilal-62210 commented 1 year ago

@bastianilso Hi, i've worked on this analysis goal : "Count how many clicks it took to solve each task" And this is my result: What do you think about that? I think it can be smart to count all events and not only de number of click

bastianilso commented 1 year ago

lead/lag values might help, can be used inside a mutate function: https://dplyr.tidyverse.org/reference/lead-lag.html

med-material / r-shiny-js-data-capture

Whack-A-Mole VR Dashboard: Data capture experiment #13