Open fraimondo opened 1 year ago
Actually, maybe it makes sense to provide a YAML interface for the register_*
functions, such that one puts in the YAML sth like:
register_parcellations:
name: CustomParcellation
path: path/to/parcellation.nii.gz
parcels_labels: path/to/labels.txt
Then one can treat this as a relative path from YAML similar to other things, but not sure if this will then end up cluttering the YAML and will also not solve the problem for other extensions that still may or may not rely on relative paths to some degree. Perhaps something like allowing user made script-based extensions to use command line arguments. For example one could make a python script to register the parcellation like:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("parc_path")
# just pseudocodeish
register_parcellation(parc_path)
The arguments could then be parametrised in the YAML, so that paths can be treated as relative from there:
with:
- my_parc_extension.py: [path/to/parc, potential_other_cmd_arg]
Or one could impose some constraint, like, the argument could be a path from yaml to a directory that contains all of the extension data ressources. Just a thought, but not sure how good this idea is, both from a technical point of view, and from a user-friendliness point of view.
I think the other option, that is also a good option, just stick with the implementation and just document it really well and explictly, to prevent confusion early on.
These seems nice concepts that I have not considered. I will just think while I type.
The main goals of YAML are: 1) To provide code-less configurations. 2) Easy to use 3) Reproduciblity (the yaml is embedded in the metadata of the feature as a json).
Option 1) from your suggestion (adding a register_parcellations
section), for me, goes against 3. Reproduciblity is limited as long as you use core junifer features. The moment that there's a with
statement with a non-junifer module (e.g.: external .py or package), we can't guarantee what's in there. This also applies with masks/parcellation. The mask/parcellation is not known to junifer, so we can't reproduce this result: if you send me this yaml file, I can't run it.
Additionally, users that want their specific masks/parcellations should be able to create them. At this point, we consider them power users and assume that they have enough coding knowledge to write the .py file that registers the mask.
Option 2 is a bit more complicated. First, it allows the user to "code" within the yaml. Second, the items in the with
statement are imported, not run.
Overall, we need to consider what the user is doing, and not only the feature we want to support.
My take is:
1) The parcellation is a common parcellation -> create an issue in github and we will add it. Adding common/published parcellations is not complicated. 2) The parcellation is made by a user -> the user is a power user and can write the extension.
1 and 2 also applies for masks.
Let's also keep in mind that we have pending a nice feature regarding masks: https://www.templateflow.org/
from templateflow import api as tflow
tflow.get('MNI152NLin6Asym', desc=None, resolution=1,
suffix='T1w', extension='nii.gz')
PosixPath('/templateflow_home/tpl-MNI152NLin6Asym/tpl-MNI152NLin6Asym_res-01_T1w.nii.gz')
I agree with all the points above. I think if users that make extensions are considered power users they can be expected to understand the subtleties of the relative paths issue, so my leaning would be towards keeping the implementation as is (paths in the YAML relative from the YAML, paths in extensions relative from the junifer working directory), and simply summarising in the docs quite well how the paths are intended to be used in each situation.
From another POV:
.juniferignore
file which lists the file not of concern to junifer..yaml
and .py
) is copied along with the .yaml
and .py
..py
i.e., relative to .py
..yaml
, the relative paths there are relative to it so when we run we are sure everything works as expected.I don't completely understand your approach @synchon
What we want to address here is relative paths in .py
files that are included in a with
statement. So lets forget about the rest.
Imagine this structure of a github repo in which juni-farm is added as a submodule in the external
directory.
project_dir/
junifer_yamls/
my_yaml1.yaml // adds ../external/masks/custom_mask_extension.py in the "with" section
my_yaml2.yaml
external/
juni-farm/
data/
masks/
this_custom_mask.nii.gz
masks/
custom_mask_extension.py // registers ../data/masks/this_custom_mask.nii.gz
My take:
1) The user needs to user relative paths by calling a junifer function: junifer_path('../relative/path/to/data.nii.gz', __FILE__)
in custom_mask_extension.py
2) The queue
command then:
2.1) parses the .py file
2.2) look for lines with junifer_path
2.3) computes the absolute path for each resource
2.4) copies each resource to a data
directory inside the corresponding jobdir
3) the junifer_path
function needs to first check inside the data
dir relative to the yaml location for the file. Otherwise, looks in Path(__FILE__).parent / '../relative/path/to/data.nii.gz'
I don't completely understand your approach @synchon
What we want to address here is relative paths in
.py
files that are included in awith
statement. So lets forget about the rest.Imagine this structure of a github repo in which juni-farm is added as a submodule in the
external
directory.project_dir/ junifer_yamls/ my_yaml1.yaml // adds ../external/masks/custom_mask_extension.py in the "with" section my_yaml2.yaml external/ juni-farm/ data/ masks/ this_custom_mask.nii.gz masks/ custom_mask_extension.py // registers ../data/masks/this_custom_mask.nii.gz
My take:
- The user needs to user relative paths by calling a junifer function:
junifer_path('../relative/path/to/data.nii.gz', __FILE__)
incustom_mask_extension.py
- The
queue
command then: 2.1) parses the .py file 2.2) look for lines withjunifer_path
2.3) computes the absolute path for each resource 2.4) copies each resource to adata
directory inside the correspondingjobdir
- the
junifer_path
function needs to first check inside thedata
dir relative to the yaml location for the file. Otherwise, looks inPath(__FILE__).parent / '../relative/path/to/data.nii.gz'
I got it correct the previous time and I see what you mean. My proposal was a bit holistic in a way:
project_dir/
my_yaml1.yaml // adds ./custom_mask_extension.py in the "with"
custom_mask_extension.py // registers ./this_custom_mask.nii.gz
this_custom_mask.nii.gz
README.md
.juniferignore
.juniferignore:
README.md
project_dir/
to junfier_jobs/my_job/
except README.md
.project_dir/
my_yaml1.yaml // adds ./custom_mask_extension.py in the "with"
my_yaml2.yaml
custom_mask_extension.py // registers ./this_custom_mask.nii.gz
this_custom_mask.nii.gz
README.md
.juniferignore
.juniferignore:
README.md
project_dir/
to junifer_jobs/my_job/
except README.md
and my_yaml2.yaml
.project_dir/
README.md
junifer_yamls/
my_yaml1.yaml // adds ../external/juni-farm/masks/custom_mask_extension.py in the "with" section
external/
juni-farm/
data/
masks/
this_custom_mask.nii.gz
masks/
custom_mask_extension.py // registers ../data/masks/this_custom_mask.nii.gz
.juniferignore:
README.md
project_dir/
to junifer_jobs/my_job/
except README.md
maintaining the path structure.project_dir/
README.md
junifer_yamls/
my_yaml1.yaml // adds ../external/juni-farm/masks/custom_mask_extension.py in the "with" section
my_yaml2.yaml
external/
juni-farm/
data/
masks/
this_custom_mask.nii.gz
masks/
custom_mask_extension.py // registers ../data/masks/this_custom_mask.nii.gz
.juniferignore:
README.md
project_dir/
to junifer_jobs/my_job/
except README.md
and my_yaml2.yaml
maintaining the path structure.But with your approach a junifer config goes from a yaml file to a directory structure. This will mean that the queue
command will end up copying the ENTIRE juni-farm. What if the relative path is even outside project_dir
? Like from another project?
For me, it makes things too complicated. We need to consider the logic from our side, modifying the user's behaviour as less as possible.
But with your approach a junifer config goes from a yaml file to a directory structure. This will mean that the
queue
command will end up copying the ENTIRE juni-farm. What if the relative path is even outsideproject_dir
? Like from another project?For me, it makes things too complicated. We need to consider the logic from our side, modifying the user's behaviour as less as possible.
I see your point and have no arguments against it. For me, a documented directory structure brings order to chaos and keeps the user and devs on same page, else I have usually found it to not go down well. Those were my two cents.
Are you requiring a new dataset or marker?
Which feature do you want to include?
Following the discussion in #127, we need a way of allowing the user to create an extension that registers a mask/parcel using a relative path to the mask (from the location of the .py extensions file).
Asking for absolute paths directly will not work in cases that extensions are stored in a repository, like juni-farm.
The issues comes with the
queue
command, as the .py file will be copied, but the data/parcellation/etc will not.@LeSasse @synchon
How do you imagine this integrated in junifer?
I don't have a clear idea on how to do it, but I can imagine something like this.
.py
file with the extension has a "list of resources" that need to be copied (maybe in a variable, or comment)queue
command: 2.1 parses the file 2.2 finds the list of resources 2.3 computes the absolute path 2.4 copies them to adata
directory in thejunifer_jobs
directory.One problem that this issue has is that the
.py
file needs to somehow be modified so the path of the resources, after thequeue
command, are changed to the newdata
directory in thejunifer_jobs
directory.Maybe some function
junifer_path('../relative/path/to/data.nii.gz', __FILE__)
can take care of solving the relative path. This function will first check in the./data
directory before usingPath(__FILE__).parent
.Other more strict but easier to implement option is that each extension has a
data
directory next to it that can be copied in the junifer jobs. The problem with this approach is that users might end up repeating data in repositories for each extension they create.Do you have a sample code that implements this outside of junifer?
No response
Anything else to say?
No response