med-material / r-shiny-js-data-capture

Data Capture System in Javascript, integrated into R Shiny
MIT License
1 stars 1 forks source link

Whack-A-Mole VR Dashboard: Data capture experiment #13

Open bastianilso opened 1 year ago

bastianilso commented 1 year ago

Once the data capture system is ready in its first version (after solving issue #5), we should make a data collection test with it. We should test the data capture system to verify that we can collect data from a real dashboard.

Steps:

  1. Embed the JS data capture system with the Whack-A-Mole Dashboard at https://github.com/med-material/Whack_A_Mole_RShiny
  2. Run experimental procedure described below with 1-2 persons.
  3. Import collected data into R and analyze it, based on the analysis goals described below.

Experimental Procedure:

Analysis Goals: We will use the data to:

bastianilso commented 1 year ago

Data to upload (upload in the order: Meta, Event, Sample) https://github.com/med-material/d3-rshiny-vis/files/9958725/log2021-07-01.12-50-01.9658Sample.zip

bilal-62210 commented 1 year ago

@bastianilso i've send on your email address the test result because the folder is too big to be send here

bastianilso commented 1 year ago

OK @bilal-62210, now focus on the analysis goals and help provide answer to the following questions. I suggest you try to do it in a new project in R.

Identify which task took the longest time to solve.
Count how many clicks it took to solve each task.
Measure how much distance, the person's mouse had to move in each task.
Measure how much the person had to scroll.
Visualize the trajectory of the person's mouse movements.
bastianilso commented 1 year ago

@bilal-62210 i noticed in aldryck's "step1" data, it says the duration is "87065" (seconds?) in the meta file image

that's a either a very long time, or maybe a count of milliseconds (?) I think it would be better to write "87.065" seconds then. (this corresponds to the ss.fff format we know from timestamps hh:mm:ss.fff

bastianilso commented 1 year ago

@bilal-62210 another suggestion, in your analysis project, after you import the CSV files you created with LoggingLoader and save the data as a single RDA file which take less space and can be uploaded to a Github repo (since the data wont really change).

You can see how saving/loading RDA files in your R project works here: https://github.com/bastianilso/bci-pam-analysis/blob/main/pam_study_preprocess.R#L16

in general, feel free to take inspiration from the general structure of analysis from here: https://github.com/bastianilso/bci-pam-analysis

we use a notion of having a "preprocess" file and an "analysis" file - in your case, an "analyis" file is probably enough for now. preprocessing is mainly used in case the data needs to be cleaned (formatting timestamps, formatting numbers..)

bilal-62210 commented 1 year ago

@bilal-62210j'ai remarqué dans les données "step1" d'aldryck, il est indiqué que la durée est de "87065" (secondes ?) dans le méta-fichier image

c'est soit très longtemps, soit peut-être un nombre de millisecondes (?) Je pense qu'il serait préférable d'écrire "87,065" secondes alors. (cela correspond au ss.fffformat que nous connaissons des horodatageshh:mm:ss.fff

Normaly there is a point in the value like this: image

bilal-62210 commented 1 year ago

hi @bastianilso can you try to open this meta.csv file please. I did some changes and now when i open this file on onenote or something else the duration format is good and i have the point between ss.fff So give me your feedback when you have open it please, to check if you have the point too. log2022-12-9 14-8-55.382Meta.csv

bastianilso commented 1 year ago

@bilal-62210 yes, this solved it, i have the point too.

bilal-62210 commented 1 year ago

@bastianilso this is what i have for the duration of a task :

image

So for this analysis goal : Identify which task took the longest time to solve, we just have to compare the duration of each task to know wich task took the longest time to solve.

bastianilso commented 1 year ago

@bilal-62210 Thanks. Here is some feedback on your process:

bilal-62210 commented 1 year ago

hi @bastianilso, i'm working on the analaysis goals and i did a wrong manipulation, i've deleted the folder with xavier's and aldryck's test. Do you still have this folder ? if yes please send it to me. If you don't have it i will do some new tests. Thanks

bastianilso commented 1 year ago

Hi @bilal-62210,

I have it on my work laptop. Better do some new tests. Use 2 other participants so we can combine the data with the data from xavier and aldryck when i come back. -B

bilal-62210 commented 1 year ago

@bastianilso ok I will do it with Pierre and Lucas.

bilal-62210 commented 1 year ago

@bastianilso i learned a lot about dplyr the last 2 days, this is my result for the duration of a task : image image Give me your feedback about that

bastianilso commented 1 year ago

nice @bilal-62210, yes these results allow us to compare the duration and solve the first analysis goal. Here are some notes from me about your solution:

1) your dataset 'D' contains Duration.x and Duration.y because the Event CSV and Meta CSV each contain a column called Duration. To fix this, I suggest we rename the Meta CSV duration column to e.g ."SessionDuration" and the Event duration to e.g. "DurationSinceStart". This also makes it more clear what the columns contain than the word Duration itself. I will create a separate issue about that.

2) As a preliminary step to your analysis you can consider renaming the columns "i2" and "i3" to e.g. "Participant" and "Task". you can use dplyr's rename() function to do this.

3) Rather than storing the results as intermediate variables you can consider concatenating line 6-10 like this (assuming the variables are renamed as per suggestions above):

DurationSummary = D %>% group_by(Participant, Task) %>%
  mutate(SessionDuration = as.numeric(SessionDuration)) %>%
  summarize(MeanDuration = mean(SessionDuration)) %>%
  arrange(desc(MeanDuration))

If you are in doubt how to see intermediate results, you can apply %>% view() at any stage of the dplyr pipeline and R studio will show you the dataset in its state at that point. For example D %>% group_by(Participant, Task) %>% mutate(SessionDuration = as.numeric(SessionDuration)) %>% view() will view the dataset after R has performed the group_by and mutate operation on D.

bilal-62210 commented 1 year ago

@bastianilso Ok i will work on it and make the changes. I've answered to your first point in the issue #16.

bilal-62210 commented 1 year ago

@bastianilso Hi, i've worked on this analysis goal : "Count how many clicks it took to solve each task" And this is my result: image What do you think about that? I think it can be smart to count all events and not only de number of click

bastianilso commented 1 year ago

lead/lag values might help, can be used inside a mutate function: https://dplyr.tidyverse.org/reference/lead-lag.html