Closed htwangtw closed 3 years ago
I will do this
Before you can work on the analysis server, set up python on your home machine. I think a python terminal and a plain text editor of your choice is enough to start.
I forgot this set-up guide exist: Neurohackademy set up There are video guides so should be more helpful than me. If you want to feel super special you can try VS Code + it's python plug-in as the text editor. You can use an integrated terminal so things are less confusing. Anaconda and VS code instruction
After you have python installed, try some stuff form this really short but practical python tutorial: Very short python tutorial You should have a basic grasp of Python and write a simple script.
For general python reference, I recommend this (web-based) book The Hitchhiker’s Guide to Python.
For learning the analysis and organisation side of neuroimaging, this is my go-to website: hitchhacker's guide to the brain
I have updated the main branch and created a pull request to update you fork @JoelPatchitt. Here's an example on how you can get the sub-directories:
from pathlib import Path
from adie.tests import get_test_data_path
get_test_data_path()
# '/home/path/to/adie_ongoingthoughts/adie/tests/data'
bids_dir = Path(get_test_data_path()) / "ds0006"
list(bids_dir.glob("sub-*"))
# [PosixPath('/home/path/to/adie_ongoingthoughts/adie/tests/data/ds006/sub-05'), ...]
If you find this too complicated, just try to use Python pathlib to list directory, copy and paste your code here and we can work from here.
I am strugling to understand what you are importing when you say 'from adie.tests import get_test_data_path', what is this?
How does this look?
`import os
location = "/home/path/to/adie_ongoingthoughts/adie/tests/data"
dirs_in_dir = []
for r, d, f in os.walk(location): for item in d: dirs_in_dir.append(item)
for item in dirs_in_dir: print(item)`
If you can have your stream of thoughts in one comment that will be really helpful.
Have you set up your environment as the README.md
file shows? Is it possible to do that with a Windows machine? We can have a meeting if you are really struggling with this.
I am strugling to understand what you are importing when you say 'from adie.tests import get_test_data_path', what is this?
If you struggle to understand this, please go through some python tutorial to understand the syntax of this language.
This means from package adie
from module test
, we import a function called get_test_data_path
.
You can find the content of code here but you really don't need to know it.
This function just produces a path that's special to your system so you don't need to find the actual path of the test data. It will be something like /home/path/to/adie_ongoingthoughts/adie/tests/data
on my laptop. Of course, this is not a real path. And I really doubt on windows path looks like that.
The reason pathlib
didn't work is likely that 1. I had a typo in the dataset name and/or 2. you didn't understand where is your working directory. If you want to work with an absolute path, it's fine for this stage.
You can use three "`" symbol to create a code block, see markdown guide.
import os
location = "/home/path/to/adie_ongoingthoughts/adie/tests/data"
dirs_in_dir = []
for r, d, f in os.walk(location):
for item in d:
dirs_in_dir.append(item)
print(item)
And here's a line-by-line annotated version of the snippet I posted:
content after ">>" means code you enter
content with no special symbol prepended means console output.
anything started with #
means comment
Import python default library pathlib
>> from pathlib import Path
From module tests
in our custom module adie
import function get_test_data_path
>> from adie.tests import get_test_data_path
Print the output of this function in console
get_test_data_path()
>> '/home/path/to/adie_ongoingthoughts/adie/tests/data' # it will always end with `adie/tests/data`
Create a pathlib object for the test data and reference test dataset ds006
bids_dir = Path(get_test_data_path()) / "ds006"
Find sub-directories starting with sub-
followed by 0 to any number of characters, pass the search results to a list and then print in the console
list(bids_dir.glob("sub-*"))
>> [PosixPath('/home/path/to/adie_ongoingthoughts/adie/tests/data/ds006/sub-05'), ...]
If you are using my code, the next step is to work out how to get these subdirectory name saved in a file called participants.tsv
with a column header "participant_id".
Windows virtual envionment set-up:
Launch anaconda powershell Prompt
cd/your/path/to/adie_ongoingthoughts
virtualenv env
.\env\scripts\activate
pip install -r requirements.txt
pip install ipython
pip install .
ipython
ERROR code with jedi: pip install jedi==0.17.2
Question: whenever I lauch the anaconda powershell, will I have to follow these steps, or will ipython remember the custom setup?
Question: whenever I lauch the anaconda powershell, will I have to follow these steps, or will ipython remember the custom setup?
Now you have created an environment to use things related to adie_ongoingthoughts
You will need to do the following when launching anaconda PowerShell:
cd /your/path/to/adie_ongoingthoughts
.\env\Scripts\activate
Or see: a tutorial online to set spyder use the virtualenv
Here's a revised version of the steps you documented (fixing syntax error etc):
# change directory to adie_ongoingthoughts
cd /your/path/to/adie_ongoingthoughts
# create virtualenv called env
virtualenv env
# activate the environment
.\env\Scripts\activate
# install requirements
pip install -r requirements.txt
# install ipython and jedi==0.17.2 (v0.18 is buggy)
pip install ipython, jedi==0.17.2
# install the adie data wrangling helpers
pip install .
Thanks Hao-Ting, here is my solution for creating a .tsv file.
from pathlib import Path
Path.cwd()
data = Path.cwd()
data.absolute() # Prints absolute path
data.glob("Sub_*") # Selects all directories named Sub_*
list(data.glob("Sub_*")) # Lists directories
P_info = list(data.glob("Sub_*")) # Adds variable for list
import csv
with open('output.tsv', 'w', newline='') as f_output:
tsv_output = csv.writer(f_output, delimiter='\t') # Delimiter set to tab
tsv_output.writerow(P_info) # Selects information for output
Good attempt. May I ask you to copy and paste the first three lines of the file created?
C:\Users\joelp\Desktop\Data\Sub_01 C:\Users\joelp\Desktop\Data\Sub_02 C:\Users\joelp\Desktop\Data\Sub_03 C:\Users\joelp\Desktop\Data\Sub_04
They are separated by tabs on the output file by the looks of things
Cool let's move to something more concrete - I would like the output that look like this:
Sub_01
Sub_02
Sub_03
...
One subject ID per line. Like the first column of this file. https://github.com/htwangtw/adie_ongoingthoughts/blob/main/adie/tests/data/ds006/participants.tsv Try to achieve this before you get back to me.
I swapped \t for tab to \n for return.
import csv
with open('output.tsv', 'w', newline='') as f_output:
tsv_output = csv.writer(f_output, delimiter='\n') # Delimiter set to tab
tsv_output.writerow(P_info)
Output as follows : C:\Users\joelp\Desktop\Data\Sub_01 C:\Users\joelp\Desktop\Data\Sub_02 C:\Users\joelp\Desktop\Data\Sub_03 C:\Users\joelp\Desktop\Data\Sub_04
This is one way to break it to different lines and for a single column file that will do. It's still not matching what I want - I want the subject number only, not the full path.
I have found a more efficient way to create the .tsv that allows for multiple columns. Although I have not been able to figure out how to rename the items within the list yet. I will try again tomorrow, or better yet ask you, as I have spent some time on it to no avail!
from pathlib import Path
Path.cwd()
data = Path.cwd()
data.absolute() # Prints absolute path
data.glob("Sub_*") # Selects all directories named Sub_* In Python, the glob module is used to retrieve files/pathnames matching a specified pattern.
list(data.glob("Sub_*")) # Lists directories
subj = list(data.glob("Sub_*")) # Adds variable for list
#Convert list into a a dateframe
from pandas import DataFrame
df = DataFrame (subj,columns=['Subjects'])
print (df)
df.to_csv('Participant_generator.tsv', sep = '\t', index=False) # Prevents the index of each row from being stored in the file
Hao-Ting,
I had a look at the split.path function and found that I could only use this on strings that are not within a list.
Please see here a work around for this made entirely by myslelf using things that I have learnt today searching for a fix, what do you think?
from pathlib import Path
Path.cwd()
data = Path.cwd()
data.absolute() # Prints absolute path
data.glob("Sub_*") # Selects all directories named Sub_* In Python, the glob module is used to retrieve files/pathnames matching a specified pattern.
list(data.glob("Sub_*")) # Lists directories
subj = list(data.glob("Sub_*")) # Adds variable for list
sub_str = [ str(e) for e in subj] # subjects as string, converts elements from windows path to string retaining list format
len(sub_str) # Equals amount of elements in substr list
total_elements = len(sub_str) # Variable for while loop
element = 0 # List elements start at 0
while element <= total_elements-1: # total_elements = 4 in this instance
sub_str[element] = sub_str[element].replace("\\","").replace("'","").replace(":","") # Removes unwanted symbols
sub_str[element] = sub_str[element].replace("CUsersjoelpDesktopData","") # Removes unwanted text
element = element+1 #End of loop
#Convert list into a a dateframe
from pandas import DataFrame
df = DataFrame (sub_str,columns=['Subjects'])
print (df)
df.to_csv('Participant_generator.tsv', sep = '\t', index=False) # Output to .tsv file
Here is the output .tsv file:
Subjects Sub_01 Sub_02 Sub_03 Sub_04
The good thing with this solution is that we can add an infinite number of participants and only have to run this through the command line once. It can also be edited to add columns to the dataframe which can easily be printed to .tsv
You deserve all the stickers.
If you copy the solution to generate_participant.py
(your pull request #11), I can give you line by line feedback.
We can work it towards something for the ADIE data.
Steps of pushing a change to a PR:
Participant_log
bin/generate_participant.py
, add your solution to the file and savebin/generate_participant.py
and commit the changesI think I have done this.. I am not sure how to push from my app though, it isnt showing up
From: Hao-Ting Wang notifications@github.com
Sent: 20 January 2021 16:32:56
To: htwangtw/adie_ongoingthoughts
Cc: Joel Patchitt; Mention
Subject: Re: [htwangtw/adie_ongoingthoughts] Generate top-level participants.tsv
and subject level scan log (#10)
You deserve all the stickers. If you copy the solution to generate_participant.py (your pull request #11https://github.com/htwangtw/adie_ongoingthoughts/pull/11), I can give you line by line feedback. We can work it towards something for the ADIE data.
Steps of pushing a change to a PR:
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/htwangtw/adie_ongoingthoughts/issues/10#issuecomment-763766623, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASQFE2DVIE2DSF3WPEPLUIDS24ATRANCNFSM4WEEUDZA.
CAUTION: This email may have originated from outside of the university. Do not click links or open attachments unless you recognise the sender and know the content is safe.
Yes you have done it I can see it. Sorry if I missed out!!! Once I comment on it it will be more obvious.
I have just figured out a loop that takes advantage of the split.path function and will post another file in the same place for you to look at too.
From: Hao-Ting Wang notifications@github.com
Sent: 20 January 2021 17:04:05
To: htwangtw/adie_ongoingthoughts
Cc: Joel Patchitt; Mention
Subject: Re: [htwangtw/adie_ongoingthoughts] Generate top-level participants.tsv
and subject level scan log (#10)
Yes you have done it I can see it. Sorry if I missed out!!! Once I comment on it it will be more obvious.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/htwangtw/adie_ongoingthoughts/issues/10#issuecomment-763790556, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASQFE2F4NIUETIPBNMF7ES3S24EILANCNFSM4WEEUDZA.
CAUTION: This email may have originated from outside of the university. Do not click links or open attachments unless you recognise the sender and know the content is safe.
Currently, there's no top-level
participants.tsv
. This file should log some basic information about each subject (see BIDS). Insourcedata
there's a very extensive version that's basically a combination ofparticipants.tsv
and scan logs.This dataset involves multiple sessions for the patient group. A session can involve multiple days. It would be useful to have a scan log to record the time of the sessions and visits.
BIDS description of scan log and participants file Example of a multi-session multi-run study Example of BIDS directory with scan log