htwangtw / adie_ongoingthoughts

ADIE ongoing thought related analysis plan
MIT License
1 stars 2 forks source link

Generate top-level `participants.tsv` and subject level scan log #10

Closed htwangtw closed 3 years ago

htwangtw commented 3 years ago

Currently, there's no top-level participants.tsv. This file should log some basic information about each subject (see BIDS). In sourcedata there's a very extensive version that's basically a combination of participants.tsv and scan logs.

This dataset involves multiple sessions for the patient group. A session can involve multiple days. It would be useful to have a scan log to record the time of the sessions and visits.

BIDS description of scan log and participants file Example of a multi-session multi-run study Example of BIDS directory with scan log

JoelPatchitt commented 3 years ago

I will do this

htwangtw commented 3 years ago

Before you can work on the analysis server, set up python on your home machine. I think a python terminal and a plain text editor of your choice is enough to start.

I forgot this set-up guide exist: Neurohackademy set up There are video guides so should be more helpful than me. If you want to feel super special you can try VS Code + it's python plug-in as the text editor. You can use an integrated terminal so things are less confusing. Anaconda and VS code instruction

After you have python installed, try some stuff form this really short but practical python tutorial: Very short python tutorial You should have a basic grasp of Python and write a simple script.

For general python reference, I recommend this (web-based) book The Hitchhiker’s Guide to Python.

For learning the analysis and organisation side of neuroimaging, this is my go-to website: hitchhacker's guide to the brain

htwangtw commented 3 years ago

I have updated the main branch and created a pull request to update you fork @JoelPatchitt. Here's an example on how you can get the sub-directories:

from pathlib import Path
from adie.tests import get_test_data_path

get_test_data_path()
# '/home/path/to/adie_ongoingthoughts/adie/tests/data'

bids_dir = Path(get_test_data_path()) / "ds0006"
list(bids_dir.glob("sub-*"))
# [PosixPath('/home/path/to/adie_ongoingthoughts/adie/tests/data/ds006/sub-05'), ...]

If you find this too complicated, just try to use Python pathlib to list directory, copy and paste your code here and we can work from here.

JoelPatchitt commented 3 years ago

I am strugling to understand what you are importing when you say 'from adie.tests import get_test_data_path', what is this?

JoelPatchitt commented 3 years ago

How does this look?

JoelPatchitt commented 3 years ago

`import os

location = "/home/path/to/adie_ongoingthoughts/adie/tests/data"

dirs_in_dir = []

for r, d, f in os.walk(location): for item in d: dirs_in_dir.append(item)

for item in dirs_in_dir: print(item)`

htwangtw commented 3 years ago

If you can have your stream of thoughts in one comment that will be really helpful.

Have you set up your environment as the README.md file shows? Is it possible to do that with a Windows machine? We can have a meeting if you are really struggling with this.

I am strugling to understand what you are importing when you say 'from adie.tests import get_test_data_path', what is this?

If you struggle to understand this, please go through some python tutorial to understand the syntax of this language. This means from package adie from module test, we import a function called get_test_data_path. You can find the content of code here but you really don't need to know it. This function just produces a path that's special to your system so you don't need to find the actual path of the test data. It will be something like /home/path/to/adie_ongoingthoughts/adie/tests/data on my laptop. Of course, this is not a real path. And I really doubt on windows path looks like that.

The reason pathlib didn't work is likely that 1. I had a typo in the dataset name and/or 2. you didn't understand where is your working directory. If you want to work with an absolute path, it's fine for this stage. You can use three "`" symbol to create a code block, see markdown guide.

import os

location = "/home/path/to/adie_ongoingthoughts/adie/tests/data"
dirs_in_dir = []
for r, d, f in os.walk(location):
    for item in d:
         dirs_in_dir.append(item)
         print(item)

And here's a line-by-line annotated version of the snippet I posted: content after ">>" means code you enter content with no special symbol prepended means console output. anything started with # means comment

Import python default library pathlib

>> from pathlib import Path  

From module tests in our custom module adie import function get_test_data_path

>> from adie.tests import get_test_data_path  

Print the output of this function in console

get_test_data_path()  
>> '/home/path/to/adie_ongoingthoughts/adie/tests/data'   # it will always end with `adie/tests/data`

Create a pathlib object for the test data and reference test dataset ds006

bids_dir = Path(get_test_data_path()) / "ds006"  

Find sub-directories starting with sub- followed by 0 to any number of characters, pass the search results to a list and then print in the console

list(bids_dir.glob("sub-*")) 
>> [PosixPath('/home/path/to/adie_ongoingthoughts/adie/tests/data/ds006/sub-05'), ...]

If you are using my code, the next step is to work out how to get these subdirectory name saved in a file called participants.tsv with a column header "participant_id".

JoelPatchitt commented 3 years ago

Windows virtual envionment set-up:

Launch anaconda powershell Prompt

cd/your/path/to/adie_ongoingthoughts virtualenv env .\env\scripts\activate pip install -r requirements.txt pip install ipython pip install .
ipython

ERROR code with jedi: pip install jedi==0.17.2

JoelPatchitt commented 3 years ago

Question: whenever I lauch the anaconda powershell, will I have to follow these steps, or will ipython remember the custom setup?

htwangtw commented 3 years ago

Question: whenever I lauch the anaconda powershell, will I have to follow these steps, or will ipython remember the custom setup?

Now you have created an environment to use things related to adie_ongoingthoughts You will need to do the following when launching anaconda PowerShell:

cd /your/path/to/adie_ongoingthoughts
.\env\Scripts\activate

Or see: a tutorial online to set spyder use the virtualenv

Here's a revised version of the steps you documented (fixing syntax error etc):

# change directory to adie_ongoingthoughts
cd /your/path/to/adie_ongoingthoughts
# create virtualenv called env
virtualenv env
# activate the environment
.\env\Scripts\activate
# install requirements
pip install -r requirements.txt
# install ipython and jedi==0.17.2 (v0.18 is buggy)
pip install ipython, jedi==0.17.2
# install the adie data wrangling helpers
pip install .
JoelPatchitt commented 3 years ago

Thanks Hao-Ting, here is my solution for creating a .tsv file.

from pathlib import Path
Path.cwd()
data = Path.cwd()
data.absolute() # Prints absolute path
data.glob("Sub_*") # Selects all directories named Sub_*
list(data.glob("Sub_*")) # Lists directories
P_info = list(data.glob("Sub_*")) # Adds variable for list

import csv
with open('output.tsv', 'w', newline='') as f_output:
    tsv_output = csv.writer(f_output, delimiter='\t') # Delimiter set to tab
    tsv_output.writerow(P_info) # Selects information for output
htwangtw commented 3 years ago

Good attempt. May I ask you to copy and paste the first three lines of the file created?

JoelPatchitt commented 3 years ago

C:\Users\joelp\Desktop\Data\Sub_01 C:\Users\joelp\Desktop\Data\Sub_02 C:\Users\joelp\Desktop\Data\Sub_03 C:\Users\joelp\Desktop\Data\Sub_04

JoelPatchitt commented 3 years ago

They are separated by tabs on the output file by the looks of things

htwangtw commented 3 years ago

Cool let's move to something more concrete - I would like the output that look like this:

Sub_01
Sub_02
Sub_03
...

One subject ID per line. Like the first column of this file. https://github.com/htwangtw/adie_ongoingthoughts/blob/main/adie/tests/data/ds006/participants.tsv Try to achieve this before you get back to me.

JoelPatchitt commented 3 years ago

I swapped \t for tab to \n for return.

import csv
with open('output.tsv', 'w', newline='') as f_output:
    tsv_output = csv.writer(f_output, delimiter='\n') # Delimiter set to tab
    tsv_output.writerow(P_info)

Output as follows : C:\Users\joelp\Desktop\Data\Sub_01 C:\Users\joelp\Desktop\Data\Sub_02 C:\Users\joelp\Desktop\Data\Sub_03 C:\Users\joelp\Desktop\Data\Sub_04

htwangtw commented 3 years ago

This is one way to break it to different lines and for a single column file that will do. It's still not matching what I want - I want the subject number only, not the full path.

JoelPatchitt commented 3 years ago

I have found a more efficient way to create the .tsv that allows for multiple columns. Although I have not been able to figure out how to rename the items within the list yet. I will try again tomorrow, or better yet ask you, as I have spent some time on it to no avail!

from pathlib import Path
Path.cwd()
data = Path.cwd()
data.absolute() # Prints absolute path
data.glob("Sub_*") # Selects all directories named Sub_* In Python, the glob module is used to retrieve files/pathnames matching a specified pattern.
list(data.glob("Sub_*")) # Lists directories
subj = list(data.glob("Sub_*")) # Adds variable for list

#Convert list into a a dateframe
from pandas import DataFrame
df = DataFrame (subj,columns=['Subjects'])
print (df)

df.to_csv('Participant_generator.tsv', sep = '\t', index=False)  # Prevents the index of each row from being stored in the file
JoelPatchitt commented 3 years ago

Hao-Ting,

I had a look at the split.path function and found that I could only use this on strings that are not within a list.

Please see here a work around for this made entirely by myslelf using things that I have learnt today searching for a fix, what do you think?

from pathlib import Path
Path.cwd()
data = Path.cwd()
data.absolute() # Prints absolute path
data.glob("Sub_*") # Selects all directories named Sub_* In Python, the glob module is used to retrieve files/pathnames matching a specified pattern.
list(data.glob("Sub_*")) # Lists directories
subj = list(data.glob("Sub_*")) # Adds variable for list   

sub_str = [ str(e) for e in subj] # subjects as string, converts elements from windows path to string retaining list format

len(sub_str) # Equals amount of elements in substr list
total_elements = len(sub_str) # Variable for while loop
element = 0 # List elements start at 0

while element <= total_elements-1: # total_elements = 4 in this instance

    sub_str[element] = sub_str[element].replace("\\","").replace("'","").replace(":","") # Removes unwanted symbols
    sub_str[element] = sub_str[element].replace("CUsersjoelpDesktopData","") # Removes unwanted text

    element = element+1 #End of loop

#Convert list into a a dateframe
from pandas import DataFrame
df = DataFrame (sub_str,columns=['Subjects'])
print (df)

df.to_csv('Participant_generator.tsv', sep = '\t', index=False) # Output to .tsv file

Here is the output .tsv file:

Subjects Sub_01 Sub_02 Sub_03 Sub_04

The good thing with this solution is that we can add an infinite number of participants and only have to run this through the command line once. It can also be edited to add columns to the dataframe which can easily be printed to .tsv

htwangtw commented 3 years ago

You deserve all the stickers. If you copy the solution to generate_participant.py (your pull request #11), I can give you line by line feedback. We can work it towards something for the ADIE data.

Steps of pushing a change to a PR:

  1. Open your GitHub app and navigate to your fork, branch name Participant_log
  2. Find the file you created in bin/generate_participant.py, add your solution to the file and save
  3. Go bake to GitHub app, stage file bin/generate_participant.py and commit the changes
  4. Push your changes See here for more details on how to commit change
JoelPatchitt commented 3 years ago

I think I have done this.. I am not sure how to push from my app though, it isnt showing up


From: Hao-Ting Wang notifications@github.com Sent: 20 January 2021 16:32:56 To: htwangtw/adie_ongoingthoughts Cc: Joel Patchitt; Mention Subject: Re: [htwangtw/adie_ongoingthoughts] Generate top-level participants.tsv and subject level scan log (#10)

You deserve all the stickers. If you copy the solution to generate_participant.py (your pull request #11https://github.com/htwangtw/adie_ongoingthoughts/pull/11), I can give you line by line feedback. We can work it towards something for the ADIE data.

Steps of pushing a change to a PR:

  1. Open your GitHub app and navigate to your fork, branch name Participant_log
  2. Find the file you created in bin/generate_participant.py, add your solution to the file and save
  3. Go bake to GitHub app, stage file bin/generate_participant.py and commit the changes
  4. Push your changes See here for more details on how to commit changehttps://docs.github.com/en/desktop/contributing-and-collaborating-using-github-desktop/committing-and-reviewing-changes-to-your-project

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/htwangtw/adie_ongoingthoughts/issues/10#issuecomment-763766623, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASQFE2DVIE2DSF3WPEPLUIDS24ATRANCNFSM4WEEUDZA.


CAUTION: This email may have originated from outside of the university. Do not click links or open attachments unless you recognise the sender and know the content is safe.

htwangtw commented 3 years ago

Yes you have done it I can see it. Sorry if I missed out!!! Once I comment on it it will be more obvious.

JoelPatchitt commented 3 years ago

I have just figured out a loop that takes advantage of the split.path function and will post another file in the same place for you to look at too.


From: Hao-Ting Wang notifications@github.com Sent: 20 January 2021 17:04:05 To: htwangtw/adie_ongoingthoughts Cc: Joel Patchitt; Mention Subject: Re: [htwangtw/adie_ongoingthoughts] Generate top-level participants.tsv and subject level scan log (#10)

Yes you have done it I can see it. Sorry if I missed out!!! Once I comment on it it will be more obvious.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/htwangtw/adie_ongoingthoughts/issues/10#issuecomment-763790556, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ASQFE2F4NIUETIPBNMF7ES3S24EILANCNFSM4WEEUDZA.


CAUTION: This email may have originated from outside of the university. Do not click links or open attachments unless you recognise the sender and know the content is safe.