Closed JoelPatchitt closed 3 years ago
Do you think we can delete generate_participant_splitpath.py
now since we have a good script to be adapted as a final product? It will not be good bye to this file forever. If you want to find it, you can find it in the history.
Yes. I'll delete it now.
From: Hao-Ting Wang notifications@github.com Sent: 21 January 2021 16:37:45 To: htwangtw/adie_ongoingthoughts Cc: Joel Patchitt; Assign Subject: Re: [htwangtw/adie_ongoingthoughts] Create generate_participant.py (#11)
Do you think we can delete generate_participant_splitpath.py now since we have a good script to be adapted as a final product? It will not be good bye to this file forever. If you want to find it, you can find it in the history.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHubhttps://github.com/htwangtw/adie_ongoingthoughts/pull/11#issuecomment-764776485, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASQFE2GOCFRTOMNJMONKCZLS3BJ5TANCNFSM4WHDDU4A.
CAUTION: This email may have originated from outside of the university. Do not click links or open attachments unless you recognise the sender and know the content is safe.
Typo fixed
Is this what you are looking for? I always end up getting confused when you ask me to set the path to github!!
I ran the function on my latop with a cwd() path and it works.
from pathlib import Path
import os
import pandas as pd
data = Path("htwangtw/adie_ongoingthoughts/adie/tests/data/7t_trt/")
def generate_participants():
subj = list(data.glob("Sub-*")) # Lists directories
sub_str = [str(e) for e in subj] # Subjects as string, converts elements from windows path to string retaining list format
sub_id = []
while sub_str: # stop when sub_str is empty
cur_sub = sub_str.pop() # pop an item from the list
cur_sub = cur_sub.split(os.sep) # split string by os specific separator, return a list of strings
sub_id.append(cur_sub[-1]) # save output
#Convert list into a a dateframe
df = pd.DataFrame(sub_id,columns=['participant_id'])
print (df)
df.to_csv('participants.tsv', sep='\t', index=False) # Output to .tsv file
Sorry I didn't get back to you for an epic weekend challenge as promised.
Here's a small one - Can you modify this function to accept different BIDS dataset and produce the same kind of file?
generate_participants("/path/to/dataset 1")
generate_participants("/path/to/dataset 2")
Both case should return participants.tsv in the bids directory.
When you say different BIDS dataset, do you mean a .xlsx file or something else?
If you have a look at the test data: https://github.com/htwangtw/adie_ongoingthoughts/tree/main/adie/tests/data
You can find 4 directories - each of them is a dataset in BIDS format (hence "BIDS dataset") I would like you to modify your code and test them on these three:
Does this explanation help?
Yes this helps. I am struggling to get that file path downloaded onto my computer, Those directories exist in your repository but not mine. How do I pull them over to mine so that i can run the tests on my computer?
Python is telling me I cannot use quotations as a function's argument, im pretty lost here.
Hey Hao-Ting,
I am now connected to the cisc volumes and analysis server. Please see below the funciton you asked for. let me know if its wrong! I await further orders!
def generate_participants(datafile):
subj = list(datafile.glob("Sub-*")) # Lists directories
sub_str = [str(e) for e in subj] # Subjects as string, converts elements from windows path to string retaining list format
sub_id = []
while sub_str: # stop when sub_str is empty
cur_sub = sub_str.pop() # pop an item from the list
cur_sub = cur_sub.split(os.sep) # split string by os specific separator, return a list of strings
sub_id.append(cur_sub[-1]) # save output
#Convert list into a a dateframe
df = pd.DataFrame(sub_id,columns=['participant_id'])
Imports neccesary:
import pandas as pd
import os
from pathlib import Path
path = Path.path/to/your/data
Update your script in bin
I will review it
There's one typo and I think it will soon be ready to ship!
Hey @htwangtw ,
I have been runnning a little side project that I have managed to I finish that I think you might be interested in.
Please see below a function that grabs the subject numbers from the directory names, as per our current script, but also enters each folder and extracts .xlsx data (I created a fake excel datasheet with age gender and handedness for each participant).
I have tried to keep the function general so that it will work on all operating systems & filepaths.
I know that this might not be useful to the current study, or maybe it will, what do you think? Are there any corrections that could be made?
# Imported functions
import pandas as pd
from pathlib import Path
import glob
import os
# File paths
path = Path.cwd() # Can be modified to directory
sub_dir = "Sub_*\\*_datasheet.xlsx" # Can be modified to suit sub-directoies
# Data extraction function
def generate_participants(datafile):
data_loc = os.path.join(path, sub_dir)
subj = list(datafile.glob("Sub_*")) # Lists directories
sub_str = [str(e) for e in subj] # Subjects as string, converts elements from windows path to string retaining list format
subjdata = glob.glob(data_loc, recursive=True)
sub_id = []
df = pd.DataFrame()
# Extract subject number from directory name
while sub_str: # Stop when sub_str is empty
cur_sub = sub_str.pop() # Pop an item from the list
cur_sub = cur_sub.split(os.sep) # Split string by os specific separator, return a list of strings
sub_id.append(cur_sub[-1]) # Save output
sub_id = pd.DataFrame(sub_id, columns=['participant_id']) # Create dataframe
sub_id["participant_id"] = sub_id["participant_id"].values[::-1] # Subject ID's flipped for some reason
# Extract .xlsx datafile from subject's directory
for file in subjdata:
if file.endswith('.xlsx'):
df = df.append(pd.read_excel(file), ignore_index=True)
# Concatonate dataframes
df2 = pd.concat([sub_id, df], axis=1)
# Convert to .tsv file
df2.to_csv('participants.tsv', sep='\t', index=False) # Output to .tsv file
Here is the output: participant_id age gender handedness Sub_01 14 m r Sub_02 36 m r Sub_03 26 f l Sub_04 18 f l
I like your attempt to address other variables that can be include in the participants.tsv
A lot of those demographic information can be found in assessment data provided from Lisa.
Do you want to open a separate issue on that?
Sure, Just tell me what needs to be done and I will give it a shot.
This branch will resolve issue #10