Open jcohenadad opened 1 week ago
Suggestion (from chatGPT) to use shared codebase:
The most elegant way to share code between two different Jupyter notebooks while minimizing duplication is to refactor the shared code into a separate, importable Python module. This allows you to define common functions, classes, and variables in a single file that both notebooks can access. Here’s how you can achieve this:
Create a new Python file (e.g., shared_code.py
) that contains all the functions, classes, and variables you want to share between the notebooks. Place this file in the same directory as your Jupyter notebooks or in a subdirectory that is easily accessible.
For example, if you have two notebooks named notebook1.ipynb
and notebook2.ipynb
:
Create a file called shared_code.py
in the same directory:
# shared_code.py
def preprocess_data(data):
# Shared data preprocessing code
pass
def custom_plot(data, title=""):
# Shared plotting function
pass
class DataProcessor:
# Shared class for processing data
def __init__(self, data):
self.data = data
def process(self):
# Process data
pass
In each of your Jupyter notebooks, import the shared code module using the following syntax:
# In notebook1.ipynb and notebook2.ipynb
from shared_code import preprocess_data, custom_plot, DataProcessor
# Now you can use the shared functions and classes
data = [...] # Replace with actual data
preprocessed_data = preprocess_data(data)
processor = DataProcessor(preprocessed_data)
processor.process()
custom_plot(preprocessed_data, title="Example Plot")
notebooks
Directory and Set PYTHONPATH
If you want to organize your project further, you can create a notebooks
directory for your Jupyter notebooks and a src
or utils
directory for the shared code:
project_root/
│
├── notebooks/
│ ├── notebook1.ipynb
│ └── notebook2.ipynb
│
├── src/
└── shared_code.py
To access the shared code from within notebook1.ipynb
or notebook2.ipynb
, you can modify the sys.path
in each notebook to include the src
directory:
import sys
sys.path.append('../src') # Adjust path as needed
from shared_code import preprocess_data, custom_plot, DataProcessor
While developing the shared code, you can use the Jupyter notebook %autoreload
magic to automatically reload changes from your shared module without restarting the kernel:
%load_ext autoreload
%autoreload 2
from shared_code import preprocess_data, custom_plot, DataProcessor
This approach will ensure that any changes you make to shared_code.py
are reflected immediately in the notebooks without re-importing the module.
Jupyter Notebook %run
Magic:
You can place shared code in a separate Jupyter notebook (e.g., shared_code.ipynb
) and use the %run
magic command in both notebook1.ipynb
and notebook2.ipynb
:
%run ./shared_code.ipynb
However, this approach is less flexible compared to using a Python module and can lead to issues with dependency tracking.
Jupyter Notebook Extensions:
Use extensions like nbdev
, which can turn Jupyter notebooks into modules and handle code sharing. This is particularly useful for complex projects but adds some overhead.
The most maintainable and scalable solution is to create a separate Python module (shared_code.py
) and import it in each notebook. This keeps your notebooks clean, avoids code duplication, and allows you to organize your shared codebase effectively.
Additional suggestions about the use of functions vs. script:
In your scenario, you have a large code block that uses variables defined earlier in the notebook, and you want to duplicate or share this code without manually copying it between notebooks. Since this code relies on variables like subjects
, GAMMA
, and voltages
defined earlier, directly extracting it into a separate function in a module would require passing these variables as arguments.
Instead, you can employ a combination of approaches that allows for shared code execution while keeping variable dependencies intact. Here’s an elegant way to handle this:
Move the repetitive code into a function that accepts the necessary variables as parameters. This helps avoid code duplication while maintaining flexibility. Here's how you can do it:
Create a shared_code.py
module:
# shared_code.py
import os
import json
import nibabel as nib
import numpy as np
from pathlib import Path
def process_subject(subject, path_data, GAMMA, voltages, siemens_loss_factor=-0.095):
"""
Processes a subject's B1 maps based on given reference voltages and stores the average map.
Parameters:
subject (str): The subject identifier.
path_data (str): The path to the data directory.
GAMMA (float): The gyromagnetic ratio.
voltages (list): List of reference voltages to check.
siemens_loss_factor (float): The power loss factor for the Siemens coil. Default is -0.095.
Returns:
None
"""
b1_maps = []
os.chdir(os.path.join(path_data, subject, "fmap"))
# Fetch the reference voltage depending on subject or from JSON
if subject == 'sub-MSSM1':
ref_voltage = 450
elif subject == 'sub-MSSM2':
ref_voltage = 350
elif subject == 'sub-MSSM3':
ref_voltage = 450
else:
with open(f"{subject}_acq-famp_TB1DREAM.json", "r") as f:
metadata = json.load(f)
ref_voltage = metadata.get("TxRefAmp", "N/A")
if ref_voltage == "N/A":
ref_token = "N/A"
for token in metadata.get("SeriesDescription", "N/A").split("_"):
if token.startswith("RefV"): ref_token = token
ref_voltage = float(ref_token[4:-1])
# Process initial reference flip angle map
nii = nib.load(f"{subject}_acq-famp_TB1DREAM.nii.gz")
meas_fa = nii.get_fdata()
meas_fa[meas_fa < 200] = np.nan
meas_fa[meas_fa > 500] = np.nan
with open(f"{subject}_acq-famp_TB1DREAM.json", "r") as f:
metadata = json.load(f)
requested_fa = metadata.get("FlipAngle", "N/A")
meas_fa = (meas_fa / 10) / requested_fa
voltage_at_socket = ref_voltage * 10 ** siemens_loss_factor
b1_map = meas_fa * (np.pi / (GAMMA * 1e-3 * voltage_at_socket))
b1_map = b1_map * 1e9
b1_maps.append(b1_map)
# Process other reference voltage maps
for voltage in voltages:
my_file = Path(f"{subject}_acq-famp-{voltage}_TB1DREAM.nii.gz")
if my_file.is_file():
if subject == 'sub-MSSM2' and voltage == "1.5":
ref_voltage = 450
elif subject == 'sub-MSSM2' and voltage == "0.66":
ref_voltage = 234
elif subject == 'sub-MSSM3' and voltage == "0.66":
ref_voltage = 328
else:
with open(f"{subject}_acq-famp-{voltage}_TB1DREAM.json", "r") as f:
metadata = json.load(f)
ref_voltage = metadata.get("TxRefAmp", "N/A")
if ref_voltage == "N/A":
ref_token = "N/A"
for token in metadata.get("SeriesDescription", "N/A").split("_"):
if token.startswith("RefV"): ref_token = token
ref_voltage = float(ref_token[4:-1])
nii = nib.load(f"{subject}_acq-famp-{voltage}_TB1DREAM.nii.gz")
meas_fa = nii.get_fdata()
meas_fa[meas_fa < 200] = np.nan
meas_fa[meas_fa > 500] = np.nan
with open(f"{subject}_acq-famp-{voltage}_TB1DREAM.json", "r") as f:
metadata = json.load(f)
requested_fa = metadata.get("FlipAngle", "N/A")
meas_fa = (meas_fa / 10) / requested_fa
else:
meas_fa = np.full((nii.header).get_data_shape(), np.nan)
voltage_at_socket = ref_voltage * 10 ** siemens_loss_factor
b1_map = meas_fa * (np.pi / (GAMMA * 1e-3 * voltage_at_socket))
b1_map = b1_map * 1e9
b1_maps.append(b1_map)
avgB1 = np.nanmean(b1_maps, axis=0)
nii_avgB1 = nib.Nifti1Image(avgB1, nii.affine, nii.header)
nib.save(nii_avgB1, f"{subject}_DREAMTB1avgB1map.nii.gz")
In each Jupyter notebook:
Import the shared function and call it with the required parameters. This allows you to execute shared code without duplicating it across notebooks.
# notebook1.ipynb
from shared_code import process_subject
# Define the necessary variables
GAMMA = 2.675e8 # [rad / (s T)]
voltages = ["1.5", "0.66"]
path_data = "/path/to/data"
subjects = ["sub-MSSM1", "sub-MSSM2", "sub-MSSM3"]
for subject in subjects:
process_subject(subject, path_data, GAMMA, voltages)
%run
Magic with Pre-defined VariablesIf you prefer to keep using variables without refactoring the code into a function, you can separate the repetitive code block into another Jupyter notebook (e.g., shared_code.ipynb
) and use the %run
magic command:
Create a notebook called shared_code.ipynb
that only contains the shared code (remove variable definitions).
In the main notebook (e.g., notebook1.ipynb
), define the variables first and then execute shared_code.ipynb
:
# notebook1.ipynb
GAMMA = 2.675e8 # [rad / (s T)]
voltages = ["1.5", "0.66"]
path_data = "/path/to/data"
subjects = ["sub-MSSM1", "sub-MSSM2", "sub-MSSM3"]
%run ./shared_code.ipynb
In shared_code.ipynb
, the code will use the variables defined in the main notebook. This approach keeps the execution within the notebook context but might be less clear if you have complex dependencies or want to share code outside Jupyter notebooks.
This issue describes the various items that need to be taken care of to uniformise code between phantom and human notebooks.
Implement in phantom notebook: data dictionary:
https://github.com/spinal-cord-7t/coil-qc-code/blob/453b88897fa66d5c8e902fbe24ab20026aed7b4a/data_processing-human.ipynb#L583-L594