sct-pipeline / spine-park

Pipeline for multicontrast analysis in PD patients
MIT License
0 stars 0 forks source link

How is the statistical analysis conducted? #42

Closed jcohenadad closed 4 months ago

jcohenadad commented 4 months ago

While working on #41, I stumbled across the question of how to format the output CSVs. More specifically: what columns of the CSV will be used, and how are the statistics computed. Across subjects? across levels?

For example, the following CSV file DWI_FA_51_aggregated.csv (resulting from a few subjects), for the FA in the WM (label 51) can produce the following violin plot:

violin_plot

In this CSV file, the column "Filename" was replaced by the column "Subject", due to the necessary aggregation between chunks (#41). Should all CSV (ie: also for the other non-DWI metrics) follow the same rule?

More details needed from @Kaonashi22

Kaonashi22 commented 4 months ago

Thanks, @jcohenadad! I will do statistical analyses between groups (patients and healthy controls) first using the average across all vertebral levels, and then specifically by vertebral level. The output csv file looks good. I would also keep the "label" column and add a row with the average across all vertebral levels for each subject. We can follow the same rule for all output files (including the non-DWI).

jcohenadad commented 4 months ago

I would also keep the "label" column

That's easy to do.

and add a row with the average across all vertebral levels for each subject.

That's more work and conflicts with the current logic of the CSV file: each row currently represents a vertebral level. Adding a column for the average across levels means that

Moreover, it is not common practice to insert redundant information within a CSV file. A better practice is to produce code that interprets the source CSV file, and generate the desired statistics (eg: average across levels, compute STD, compute median, computes min/max, etc.).

jcohenadad commented 4 months ago

df02c23afd4bb8273cc8cb551dbc16940e30ac3d now outputs the following table for DWI scans:

Subject VertLevel Label Size [vox] MAP() STD() Timestamp Filename SCT Version
sub-BB277 2 white matter 80.68 0.6483 0.0993 2024-06-18 13:13:20 /path/to/data/sub-BB277_chunk-1_DWI_moco_FA git-jca/4527-register-template-step0-517cc
sub-BB277 3 white matter 93.12 0.6232 0.1262 2024-06-18 13:13:20 /path/to/data/sub-BB277_chunk-1_DWI_moco_FA git-jca/4527-register-template-step0-517cc
sub-BB277 4 white matter 100.70 0.6736 0.1203 2024-06-18 13:13:20 /path/to/data/sub-BB277_chunk-1_DWI_moco_FA git-jca/4527-register-template-step0-517cc
sub-BB277 5 white matter 108.99 0.5660 0.1235 2024-06-18 13:13:20 /path/to/data/sub-BB277_chunk-1_DWI_moco_FA git-jca/4527-register-template-step0-517cc
sub-BB277 6 white matter 65.68 0.5304 0.1313 2024-06-18 13:13:20 /path/to/data/sub-BB277_chunk-1_DWI_moco_FA git-jca/4527-register-template-step0-517cc

@Kaonashi22 would you like to apply this format for the other metrics as well? if so,

jcohenadad commented 4 months ago

Idea proposed in https://github.com/sct-pipeline/spine-park/issues/42#issuecomment-2200754999 is implemented in 14f074b2cb64027037a8c47d0f05bc2eb0f93354.

Currently testing... will upload the output results/ folder for your approbation @Kaonashi22

here it is: results.zip

Kaonashi22 commented 4 months ago

That's more work and conflicts with the current logic of the CSV file: each row currently represents a vertebral level. Adding a column for the average across levels means that

* the column 'vertebral level' is undetermined for the row=average

* the column 'average across level' is undetermined for the row=not_average

Moreover, it is not common practice to insert redundant information within a CSV file. A better practice is to produce code that interprets the source CSV file, and generate the desired statistics (eg: average across levels, compute STD, compute median, computes min/max, etc.).

OK, then I will compute the mean across levels separately

Kaonashi22 commented 4 months ago
* should we overwrite the original CSV files

* or should we create new CSV file with another suffix (eg `_formatted`)

We can overwrite the previous files and only keep the formatted ones

Kaonashi22 commented 4 months ago

Idea proposed in #42 (comment) is implemented in 14f074b.

Currently testing... will upload the output results/ folder for your approbation @Kaonashi22

here it is: results.zip

The presentation is good, thanks a lot!

jcohenadad commented 4 months ago

feature implemented

jcohenadad commented 4 months ago

We can overwrite the previous files and only keep the formatted ones

sorry i missed that-- do you still need it or can your analysis script for the statistics figure out the _formatted and _aggregated suffixes?

Kaonashi22 commented 4 months ago

I actually don't need that because I'll rerun the analysis from scratch


From: Julien Cohen-Adad @.> Sent: July 1, 2024 16:34 To: sct-pipeline/spine-park @.> Cc: Lydia Chougar, Dr @.>; Mention @.> Subject: Re: [sct-pipeline/spine-park] How is the statistical analysis conducted? (Issue #42)

We can overwrite the previous files and only keep the formatted ones

sorry i missed that-- do you still need it or can your analysis script for the statistics figure out the _formatted and _aggregated suffixes?

— Reply to this email directly, view it on GitHubhttps://github.com/sct-pipeline/spine-park/issues/42#issuecomment-2200972233, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFCFJYV4BGD4VKFALWYEHNTZKG4LTAVCNFSM6AAAAABKCRE4U6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBQHE3TEMRTGM. You are receiving this because you were mentioned.Message ID: @.***>