Data processing and analysis

mayacakmak commented 3 years ago

We can move discussion about data to this issue.

One initial TODO:

[x] Write a script that converts (probably a subset of) the data into a CSV (spreadsheet compatible) format so we can more easily browse, take notes on, code, etc. open ended answers

kavidey commented 3 years ago

This script converts the json data into two csv files. One contains cycle data, and the other contains questionnaire responses. The original json file, along with the two csv files are uploaded to this google drive folder: https://drive.google.com/drive/u/0/folders/1R89qRGApJjWt0EQz45dtSLZaQNc5mKGz The two output csv files are also in the data-processing folder

The cycle file only contains general data about each cycle (cycle length, target position/rotation/threshXY/threshXY, user id, interface type, etc.) It does not have data about any individual actions that the user took.

The only included cycles were cycles completed on the testing page and by users whos, user ID we logged as having completed this round of the study.

mayacakmak commented 3 years ago

Excellent! @csemecu can you give it a try?

csemecu commented 3 years ago

Yes! I will take a look at the data and start running some preliminary analyses with SPSS. I'll update you both early next week. One thing I was asking Kavi is if these files are de-identified, I'm thinking so.

kavidey commented 3 years ago

The results of the second batch are done processing. I added new files to the google drive folder. They are a combination of both batches (let me know if we want separate ones).

mayacakmak commented 3 years ago

Started combining/summarizing data in this spreadsheet for quick checks and visualization: https://docs.google.com/spreadsheets/d/1snU2Rs7UUd3au_3eSIWlyoQcLhFxicjWH2Oe_wAJIBo/edit?usp=sharing (in the same folder as the raw data)

mayacakmak commented 3 years ago

I've cleaned up the data in the spreadsheet, only including participants with a unique ID that has exactly 36 cycles associated--that is 156 out of 192 entries. @KaviMD we might want to remove those who do not have 36 cycles in the data processing pipeline already, but the spreadsheet can now handle mixed data to be pasted in. There were a few other small corruptions in the data, like the questionnaire answers being mis-ordered (probably due to some parts missing) but those were all eliminated anyways due to the number of cycles corruption.

Based on the distribution of the valid data I updated the 'todo' numbers in the database to run another batch (trying to get to 30 per condition but will likely have a drop with similar issues). I hope I got the ordering of the interfaces correct :) Working on getting the previous workers qualified so we can exclude them, will then start the next data collection.

csemecu commented 3 years ago

Yesterday I sent out an email with the results I got from SPSS. I used 166 participants. The ones that had more than 36 cycles I only used the first 36 and discarded the rest. But I can run everything again with those 156 that you mention. I added the exported results as an htm file in the drive folder

There are four significant groupings of the control types: target and targetdrag show no statistical difference between them. I also plotted each control types with multiple bars to show click and press/release. The interface that benefits the most is the panel one so maybe that is something to discuss if we want to keep that interface as one of the accessible ones to be implemented with that transition.

mayacakmak commented 3 years ago

@csemecu Adding stats analysis discussion to this thread also--from Maru:

There are four significant groupings of the control types: target and targetdrag show no statistical difference between them. I also plotted each control types with multiple bars to show click and press/release. The interface that benefits the most is the panel one so maybe that is something to discuss if we want to keep that interface as one of the accessible ones to be implemented with that transition.

(Note: the dependent variable/measurement shown here is cycleLength)

@csemecu Let's make sure you have the full/clean data before running the analysis again.

mayacakmak commented 3 years ago

@csemecu Messages crossed :)

Yes, it is unclear which 36 is the right one to include so let's just remove those participants. Also let's wait for the next batch.

mayacakmak commented 3 years ago

One question about analysis: We have 36 measurements from each participant but the data is pooled together. This analysis:

.. looks the same as it would look if each data point came from a different participant, instead of 36 data points from 166 participants. Shouldn't participant ID be a factor in the analysis?

mayacakmak commented 3 years ago

Initial look at questionnaire data is in the 'Interface comparisons' tab of the spreadsheet.

Replicated the scatterplots for Time -vs- dist_xy, dist_theta, thresh_xy, thresh_theta, with trendlines, showing some interesting trends, in the 'Target comparisons' tab of the spreadsheet.

@KaviMD I'm still thinking about your question about which interfaces to move on with for the SE3 study. I'm not sure a clear answer is in the results. I'm trying to think about the narrative and what reasoning would make most sense.

mayacakmak commented 3 years ago

@KaviMD I'm starting to look into the next batch of data. How long does get_firebase_data.py script usually take? Like you mentioned before, the web interface didn't work for exporting. One todo for future (perhaps already for SE3 study) is to save detailed session data (for recreating all actions) in a different branch of the database so it can be downloaded separately (at its own pace) if needed, but we get quicker access to measurement data.

The other question I have is, in what order should I then run the different scripts under data_processing/ once I have the data downloaded?

mayacakmak commented 3 years ago

Oh, looks like I'm done downloading, so the answer to the first question (for the latest data) is about 5 minutes.

mayacakmak commented 3 years ago

So far process_data.py and calculate_session_lengths.py seem outdated, because they require manually entering uids and have fewer uids than we had in the data already.

Tried json_to_csv.py but that requires a different json than the one get_firebase_data.py downloads (one that also has state branch of the database. I'll try this again after changing get_firebase_data.py to download the whole thing.

mayacakmak commented 3 years ago

Actually I was not able to download root of the database, so just went ahead and modified json_to_csv.py This part of the repo needs a bit more documentation ;)

@KaviMD the question labels in the questionnaire data seem to be mixed up in the latest data--I was already a bit suspicious the earlier data, feeling unsure which question id (e.g. section-0-question-1) corresponds to which actual question. Can you take a look if that's an issue in the collected data or in the data processing? We cannot tell apart the NASA TLX rating answers and statement agreement answers from one another so we really need to know which is which. I tagged you in the spreadsheets; will commit latest jsons and code.

kavidey commented 3 years ago

Sorry about the documentation and the other problems with the code, a bunch of those files were very out of date. I am working on adding more documentation and cleaning everything up.

I manually verified the data in firebase for several users against the data in the spreadsheet for the first 2 batches. (I verified that section-0-question-1 in firebase definitely corresponds to section-0-question-1 in our spreadsheet and the same is true for all of the other questions). Firebase stores the questions in an alphanumeric order that results in (0, 1 ,10, 11, 12, etc.) so json_to_csv.py re-orders the questions, but I made sure that it reorders the column titles too, so the mapping should be preserved.

I also looked into the front end HTML code and verified that all of the questions and id's matchup (ex that section-0-question-1 is actually the first question on the first page).

I am fairly confident that the order of the questions in those spreadsheets is correct. I just looked at the data from the new study, and I definitely agree that there is a problem. I am looking into what happened and will respond here as soon as I find out.

mayacakmak commented 3 years ago

Perfect, thanks for verifying. So you think there's a problem with the new raw data (not the processing json to cvs code)?

kavidey commented 3 years ago

I just finished running a bunch of different tests on the data. I haven't figured out exactly why the problem occurs, but I know how to fix it.

The algorithm in json_to_csv.py to convert the questionnaire data to csv format only works properly if the questionnaire responses are stored in a very specific (currently hardcoded) order. This is the "original" order of the questions (this is the order that was used in the json data download for the first two batches): ['section-0-question-0', 'section-0-question-1', 'section-0-question-10', 'section-0-question-11', 'section-0-question-12', 'section-0-question-13', 'section-0-question-2', 'section-0-question-3', 'section-0-question-4', 'section-0-question-5', 'section-0-question-6', 'section-0-question-7', 'section-0-question-8', 'section-0-question-9', 'section-1-question-0', 'section-1-question-1', 'section-1-question-10-input', 'section-1-question-11', 'section-1-question-12', 'section-1-question-13', 'section-1-question-14-input', 'section-1-question-2', 'section-1-question-3-input', 'section-1-question-4', 'section-1-question-5-input', 'section-1-question-6', 'section-1-question-7', 'section-1-question-8-input', 'section-1-question-9', 'section-2-question-0-input', 'section-2-question-1', 'section-2-question-2', 'section-2-question-3', 'section-2-question-4-input', 'section-2-question-5', 'section-2-question-6-input', 'section-2-question-7', 'section-2-question-8-input']

And the "new" order (This was the order of the json data downloaded from the 9-17 batch): ['section-2-question-2', 'section-2-question-3', 'section-2-question-1', 'section-2-question-7', 'section-2-question-5', 'section-1-question-5-input', 'section-2-question-8-input', 'section-2-question-6-input', 'section-1-question-8-input', 'section-2-question-0-input', 'section-0-question-13', 'section-0-question-11', 'section-2-question-4-input', 'section-0-question-12', 'section-0-question-6', 'section-0-question-10', 'section-1-question-2', 'section-1-question-7', 'section-1-question-6', 'section-1-question-4', 'section-1-question-1', 'section-1-question-14-input', 'section-1-question-0', 'section-1-question-9', 'section-1-question-3-input', 'section-1-question-13', 'section-1-question-12', 'section-1-question-11', 'section-0-question-8', 'section-0-question-9', 'section-0-question-4', 'section-0-question-5', 'section-1-question-10-input', 'section-0-question-7', 'section-0-question-0', 'section-0-question-1', 'section-0-question-2', 'section-0-question-3']

Those orders are pretty different and explain why the questionnaire responses were in the completely wrong order. To be clear, all of the questionnaire data was correctly saved in Firebase. The problem is that Firebase reorders elements (usually alphabetically) when they are written to the database, and for some reason this time when the data was downloaded, it put them in a different order than the code expected.

I tried downloading the data with get_firebase_data.py myself and downloading it manually through wget and the web interface, but every time, the questionnaire responses were always in the "original" order. I wonder if python somehow reordered the questions when they were saved to disk. Maybe different OSs have different sorting algorithms or something?

I am working on updatingjson_to_csv.py to dynamically detect the order of the questions in the future. In the meantime, I have uploaded the fixed questionnaires and cycles csv files along with the study and state json files to the google drive. I am working on updating the rest of the data processing scripts to work together and will post a readme with an order to run the script, instructions, and more documentation once they are done.

mayacakmak commented 3 years ago

Phew.. nice detective work and thanks so much for fixing everything!

I didn't use wget, I only used the get_firebase_data.py script to get the data from firebase; I wonder if that's part of the reason. Or could it be a Python2 Python3 thing? I used Python3 other than the one that required the firebase module. Your solution sounds good (even though hopefully we have enough data and we don't have to redo this too many more times ;))

mayacakmak commented 3 years ago

One question about analysis: We have 36 measurements from each participant but the data is pooled together. This analysis:

.. looks the same as it would look if each data point came from a different participant, instead of 36 data points from 166 participants. Shouldn't participant ID be a factor in the analysis?

@csemecu Can you take a look at this question about statistical analysis?

mayacakmak commented 3 years ago

@tapomayukh See the readme here to get started on the data processing: for Study 1: https://github.com/mayacakmak/se2/tree/master/data-processing

Also here's the spreadsheet where we're inspecting the same data: https://docs.google.com/spreadsheets/d/1snU2Rs7UUd3au_3eSIWlyoQcLhFxicjWH2Oe_wAJIBo/edit?usp=sharing .. will tag you shortly.

csemecu commented 3 years ago

Yes, I can definitely do that. I will add participant as another block

On Wed, Oct 14, 2020, 11:50 AM Maya Cakmak notifications@github.com wrote:

One question about analysis: We have 36 measurements from each participant but the data is pooled together. This analysis: [image: Screen Shot 2020-09-16 at 2 39 24 PM] https://user-images.githubusercontent.com/3083404/93395436-c70c2c00-f82a-11ea-9361-a2b17d08f5a2.png

.. looks the same as it would look if each data point came from a different participant, instead of 36 data points from 166 participants. Shouldn't participant ID be a factor in the analysis?

@csemecu https://github.com/csemecu Can you take a look at this question about statistical analysis?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mayacakmak/se2/issues/17#issuecomment-708593448, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKXGBLQCUQB42JAQMQNWNOLSKXXGHANCNFSM4RGDDENQ .

tapomayukh commented 3 years ago

When I try to convert the json to csv, I am getting a unicode error: "UnicodeEncodeError: 'ascii' codec can't encode character u'\u2192' in position 142: ordinal not in range(128)". Any ideas why? Note, my json file downloading took around 3 minutes. And, I didn't change any permissions on the firebase side.

tapomayukh commented 3 years ago

Nevermind, I fixed it by switching from python 2.7 to python 3.6

tapomayukh commented 3 years ago

Today, weirdly, I am getting a new error trying to download data from firebase: "requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://accessible-teleop.firebaseio.com/.json". I am guessing it's a firebase rules thing? How do I change it on the firebase side (I could not find it on the README)?

kavidey commented 3 years ago

Sorry about that, I updated the rules yesterday and disabled permission for the script to download data.

I have re-enabled it. The rules can be edited here: https://console.firebase.google.com/u/0/project/accessible-teleop/database/accessible-teleop/rules (there is a comment explaining what to do). I also added it to the README.

mayacakmak commented 3 years ago

@KaviMD See Issue #18 for some data cleaning questions; moving discussion here.

For study 2 it would be great if we can extract some additional metrics to support the narrative, two that I thought of were:

[x] Number of clicks (per task): something expected to correlate with task completion time (i.e. 'cycle length') but might have interesting differences
[x] Duration of dragging (per task): only relevant for the four press/release interfaces, to give a sense of the pressing requirements of the interfaces when we say this is something that makes these interfaces less accessible

@KaviMD @tapomayukh Any ideas of other things that would be useful to look at?

mayacakmak commented 3 years ago

@csemecu Any luck with statistical analysis? Perhaps we needed to give you new data? @KaviMD could we perhaps have the data processing script also filter 'valid' participants (i.e. exactly 36 cycles) and subselect the first 24 valid participants, so we're sure everything we're reporting is based on the same data?

tapomayukh commented 3 years ago

@KaviMD See Issue #18 for some data cleaning questions; moving discussion here.

For study 2 it would be great if we can extract some additional metrics to support the narrative, two that I thought of were:

[x] Number of clicks (per task): something expected to correlate with task completion time (i.e. 'cycle length') but might have interesting differences

[x] Duration of dragging (per task): only relevant for the four press/release interfaces, to give a sense of the pressing requirements of the interfaces when we say this is something that makes these interfaces less accessible

@KaviMD @tapomayukh Any ideas of other things that would be useful to look at?

Yes, these could be very informative. Are we already collecting switches (between clicks, drags etc. or translation, rotation etc.) and their correlation with task time, task progress, or task type?

kavidey commented 3 years ago

@tapomayukh For both SE2 and SE3 we collect data anytime the user interacts with the interface. For interfaces with interactable elements like arrows, there is a state machine that logs and keeps track of the current state of the interface (moving up/down/right/left, rotating, etc.) All of this is collected per cycle so it can easily be correlated with a specific user or interface

The filtered .csv files are on uploaded to the google drive.

Cycle Data: https://drive.google.com/file/d/1xL-rRFK4AgOKbw3t08lrEVVjjGKF8xrs/view?usp=sharing
- This includes the new columns numClicks and draggingDuration.
- On interfaces that were not press/release, draggingDuration should be -1
Questionnaire Results: https://drive.google.com/file/d/1js4WPTTqFP_rq8AWBDcujA6jH0g_MTxD/view?usp=sharing

I am working on updating the boxplot from #18 with the new data and will post the results here when that is done.

kavidey commented 3 years ago

This is the boxplot of questionnaire data from #18 with the new filtered data:

@mayacakmak The orange line is the median and the dashed green line is the mean (sometimes the fall on top of each other or on the top/bottom of a box)

kavidey commented 3 years ago

I just found a bug in the dragging distance calculations, it was being calculated for all interfaces except ones with press/release transitions instead of only those interfaces. I updated the file on google drive with the fixed version: https://drive.google.com/file/d/1xL-rRFK4AgOKbw3t08lrEVVjjGKF8xrs/view?usp=sharing

mayacakmak / se2

Data processing and analysis #17