Open bastianilso opened 1 year ago
Data to upload (upload in the order: Meta, Event, Sample) https://github.com/med-material/d3-rshiny-vis/files/9958725/log2021-07-01.12-50-01.9658Sample.zip
@bastianilso i've send on your email address the test result because the folder is too big to be send here
OK @bilal-62210, now focus on the analysis goals and help provide answer to the following questions. I suggest you try to do it in a new project in R.
Identify which task took the longest time to solve. Count how many clicks it took to solve each task. Measure how much distance, the person's mouse had to move in each task. Measure how much the person had to scroll. Visualize the trajectory of the person's mouse movements.
@bilal-62210 i noticed in aldryck's "step1" data, it says the duration is "87065" (seconds?) in the meta file
that's a either a very long time, or maybe a count of milliseconds (?)
I think it would be better to write "87.065" seconds then. (this corresponds to the ss.fff
format we know from timestamps hh:mm:ss.fff
@bilal-62210 another suggestion, in your analysis project, after you import the CSV files you created with LoggingLoader and save the data as a single RDA file which take less space and can be uploaded to a Github repo (since the data wont really change).
You can see how saving/loading RDA files in your R project works here: https://github.com/bastianilso/bci-pam-analysis/blob/main/pam_study_preprocess.R#L16
in general, feel free to take inspiration from the general structure of analysis from here: https://github.com/bastianilso/bci-pam-analysis
we use a notion of having a "preprocess" file and an "analysis" file - in your case, an "analyis" file is probably enough for now. preprocessing is mainly used in case the data needs to be cleaned (formatting timestamps, formatting numbers..)
@bilal-62210j'ai remarqué dans les données "step1" d'aldryck, il est indiqué que la durée est de "87065" (secondes ?) dans le méta-fichier
c'est soit très longtemps, soit peut-être un nombre de millisecondes (?) Je pense qu'il serait préférable d'écrire "87,065" secondes alors. (cela correspond au
ss.fff
format que nous connaissons des horodatageshh:mm:ss.fff
Normaly there is a point in the value like this:
hi @bastianilso can you try to open this meta.csv file please. I did some changes and now when i open this file on onenote or something else the duration format is good and i have the point between ss.fff So give me your feedback when you have open it please, to check if you have the point too. log2022-12-9 14-8-55.382Meta.csv
@bilal-62210 yes, this solved it, i have the point too.
@bastianilso this is what i have for the duration of a task :
So for this analysis goal : Identify which task took the longest time to solve, we just have to compare the duration of each task to know wich task took the longest time to solve.
@bilal-62210 Thanks. Here is some feedback on your process:
LoadFromDirectory("data/")
. Make sure to pass the parameter Sample="ContinuousMeasurement"
since by default we assume sample files are called "sample" and in your case they are called continuous measurements.group_by()
and group by the columns which indicate which participant and test this data is from. in addition, you may want to use a summarize()
function to summarize the duration by each. There are youtube videos demonstrating how to do this with dplyr online.hi @bastianilso, i'm working on the analaysis goals and i did a wrong manipulation, i've deleted the folder with xavier's and aldryck's test. Do you still have this folder ? if yes please send it to me. If you don't have it i will do some new tests. Thanks
Hi @bilal-62210,
I have it on my work laptop. Better do some new tests. Use 2 other participants so we can combine the data with the data from xavier and aldryck when i come back. -B
@bastianilso ok I will do it with Pierre and Lucas.
@bastianilso i learned a lot about dplyr the last 2 days, this is my result for the duration of a task : Give me your feedback about that
nice @bilal-62210, yes these results allow us to compare the duration and solve the first analysis goal. Here are some notes from me about your solution:
1) your dataset 'D' contains Duration.x and Duration.y because the Event CSV and Meta CSV each contain a column called Duration. To fix this, I suggest we rename the Meta CSV duration column to e.g ."SessionDuration" and the Event duration to e.g. "DurationSinceStart". This also makes it more clear what the columns contain than the word Duration itself. I will create a separate issue about that.
2) As a preliminary step to your analysis you can consider renaming the columns "i2" and "i3" to e.g. "Participant" and "Task". you can use dplyr's rename()
function to do this.
3) Rather than storing the results as intermediate variables you can consider concatenating line 6-10 like this (assuming the variables are renamed as per suggestions above):
DurationSummary = D %>% group_by(Participant, Task) %>%
mutate(SessionDuration = as.numeric(SessionDuration)) %>%
summarize(MeanDuration = mean(SessionDuration)) %>%
arrange(desc(MeanDuration))
If you are in doubt how to see intermediate results, you can apply %>% view()
at any stage of the dplyr pipeline and R studio will show you the dataset in its state at that point. For example D %>% group_by(Participant, Task) %>% mutate(SessionDuration = as.numeric(SessionDuration)) %>% view()
will view the dataset after R has performed the group_by and mutate operation on D.
@bastianilso Ok i will work on it and make the changes. I've answered to your first point in the issue #16.
@bastianilso Hi, i've worked on this analysis goal : "Count how many clicks it took to solve each task" And this is my result: What do you think about that? I think it can be smart to count all events and not only de number of click
lead/lag values might help, can be used inside a mutate function: https://dplyr.tidyverse.org/reference/lead-lag.html
Once the data capture system is ready in its first version (after solving issue #5), we should make a data collection test with it. We should test the data capture system to verify that we can collect data from a real dashboard.
Steps:
Experimental Procedure:
Analysis Goals: We will use the data to: