Closed jcohenadad closed 4 months ago
Thanks, @jcohenadad! I will do statistical analyses between groups (patients and healthy controls) first using the average across all vertebral levels, and then specifically by vertebral level. The output csv file looks good. I would also keep the "label" column and add a row with the average across all vertebral levels for each subject. We can follow the same rule for all output files (including the non-DWI).
I would also keep the "label" column
That's easy to do.
and add a row with the average across all vertebral levels for each subject.
That's more work and conflicts with the current logic of the CSV file: each row currently represents a vertebral level. Adding a column for the average across levels means that
Moreover, it is not common practice to insert redundant information within a CSV file. A better practice is to produce code that interprets the source CSV file, and generate the desired statistics (eg: average across levels, compute STD, compute median, computes min/max, etc.).
df02c23afd4bb8273cc8cb551dbc16940e30ac3d now outputs the following table for DWI scans:
Subject | VertLevel | Label | Size [vox] | MAP() | STD() | Timestamp | Filename | SCT Version |
---|---|---|---|---|---|---|---|---|
sub-BB277 | 2 | white matter | 80.68 | 0.6483 | 0.0993 | 2024-06-18 13:13:20 | /path/to/data/sub-BB277_chunk-1_DWI_moco_FA | git-jca/4527-register-template-step0-517cc |
sub-BB277 | 3 | white matter | 93.12 | 0.6232 | 0.1262 | 2024-06-18 13:13:20 | /path/to/data/sub-BB277_chunk-1_DWI_moco_FA | git-jca/4527-register-template-step0-517cc |
sub-BB277 | 4 | white matter | 100.70 | 0.6736 | 0.1203 | 2024-06-18 13:13:20 | /path/to/data/sub-BB277_chunk-1_DWI_moco_FA | git-jca/4527-register-template-step0-517cc |
sub-BB277 | 5 | white matter | 108.99 | 0.5660 | 0.1235 | 2024-06-18 13:13:20 | /path/to/data/sub-BB277_chunk-1_DWI_moco_FA | git-jca/4527-register-template-step0-517cc |
sub-BB277 | 6 | white matter | 65.68 | 0.5304 | 0.1313 | 2024-06-18 13:13:20 | /path/to/data/sub-BB277_chunk-1_DWI_moco_FA | git-jca/4527-register-template-step0-517cc |
@Kaonashi22 would you like to apply this format for the other metrics as well? if so,
_formatted
)Idea proposed in https://github.com/sct-pipeline/spine-park/issues/42#issuecomment-2200754999 is implemented in 14f074b2cb64027037a8c47d0f05bc2eb0f93354.
Currently testing... will upload the output results/ folder for your approbation @Kaonashi22
here it is: results.zip
That's more work and conflicts with the current logic of the CSV file: each row currently represents a vertebral level. Adding a column for the average across levels means that
* the column 'vertebral level' is undetermined for the row=average * the column 'average across level' is undetermined for the row=not_average
Moreover, it is not common practice to insert redundant information within a CSV file. A better practice is to produce code that interprets the source CSV file, and generate the desired statistics (eg: average across levels, compute STD, compute median, computes min/max, etc.).
OK, then I will compute the mean across levels separately
* should we overwrite the original CSV files * or should we create new CSV file with another suffix (eg `_formatted`)
We can overwrite the previous files and only keep the formatted ones
Idea proposed in #42 (comment) is implemented in 14f074b.
Currently testing... will upload the output results/ folder for your approbation @Kaonashi22
here it is: results.zip
The presentation is good, thanks a lot!
feature implemented
We can overwrite the previous files and only keep the formatted ones
sorry i missed that-- do you still need it or can your analysis script for the statistics figure out the _formatted
and _aggregated
suffixes?
I actually don't need that because I'll rerun the analysis from scratch
From: Julien Cohen-Adad @.> Sent: July 1, 2024 16:34 To: sct-pipeline/spine-park @.> Cc: Lydia Chougar, Dr @.>; Mention @.> Subject: Re: [sct-pipeline/spine-park] How is the statistical analysis conducted? (Issue #42)
We can overwrite the previous files and only keep the formatted ones
sorry i missed that-- do you still need it or can your analysis script for the statistics figure out the _formatted and _aggregated suffixes?
— Reply to this email directly, view it on GitHubhttps://github.com/sct-pipeline/spine-park/issues/42#issuecomment-2200972233, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFCFJYV4BGD4VKFALWYEHNTZKG4LTAVCNFSM6AAAAABKCRE4U6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBQHE3TEMRTGM. You are receiving this because you were mentioned.Message ID: @.***>
While working on #41, I stumbled across the question of how to format the output CSVs. More specifically: what columns of the CSV will be used, and how are the statistics computed. Across subjects? across levels?
For example, the following CSV file DWI_FA_51_aggregated.csv (resulting from a few subjects), for the FA in the WM (label 51) can produce the following violin plot:
In this CSV file, the column "Filename" was replaced by the column "Subject", due to the necessary aggregation between chunks (#41). Should all CSV (ie: also for the other non-DWI metrics) follow the same rule?
More details needed from @Kaonashi22