nipreps / mriqc-learn

Learning on MRIQC-generated image quality metrics (IQMs).
Apache License 2.0
6 stars 6 forks source link

How to chosse the cut-off value for delete images with bad quality? #20

Open EdithGaspar opened 1 year ago

EdithGaspar commented 1 year ago

I have the MRIQC results from 1000 subjects, but i dont understand how can i choose the cut-off value for choose my best images or that ones to delete

celprov commented 1 year ago

Hi @EdithGaspar, Could you define specifically what value are you referring to?

oesteban commented 1 year ago

I have the MRIQC results from 1000 subjects, but i dont understand how can i choose the cut-off value for choose my best images or that ones to delete

There's no rule of thumb to do this. As we introduced in our MRIQC paper (https://doi.org/10.1371/journal.pone.0184661), you can train a classifier on a subset of your data (that you manually annotate) to then apply it on the remainder of the dataset. The original code for the classifier was moved into the nipreps/mriqc-learn repo.

Perhaps @jaimebarran or @t-sanchez, who have recently worked with mriqc-learn, can give you some insights or share their experience.

jaimebarran commented 1 year ago

Hi @EdithGaspar,

You can use the baseline model https://github.com/nipreps/mriqc-learn as follows: First you have to load it:

from joblib import load
# Load the trained model
model = load("/mriqc_learn/mriqc_learn/data/classifier.joblib") # check your path

And then you can use y_pred = model.predict(your_loaded_dataset) which will return binary values (cutoff is 0.5), or alternatively, you can use y_scores = model.predict_proba(your_loaded_dataset)[:, 0] which will return the probabilities for each image to belong to class '0' in this case (negative class = excluded quality). Then you can decide a threshold and get the indices of the values under or over that threshold, for example:

threshold = 0.7
y_pred_idx = (y_scores > threshold).nonzero()[0]

I would recommend you to retrain the model with updated Python libraries (numpy, sklearn, etc.) before getting directly the model from the repo. You can do that following the tutorial https://github.com/nipreps/mriqc-learn/blob/main/docs/notebooks/Tutorial.ipynb, saving the trained model using:

from joblib import dump
dump(model, "/mriqc-learn/mriqc_learn/data/your_new_classifier.joblib")

In addition, you could train the model with your data as long as you have subjective ratings, loading your prepared data using load_dataset function.

Let me know if you need additional help!

Cheers!

andrew-yian-sun commented 7 months ago

@jaimebarran I was trying to see if I could run the baseline model. A couple of issues:

  1. When I install mriqc-learn (pip install mriqc-learn), the classifier.joblib file didn't come with the install (neither did production.py).

So I downloaded the raw classifier.joblib file from this repo and added it to where I thought it should be: C:\Users\Andrew\anaconda3\Lib\site-packages\mriqc_learn\data

  1. However, when I try running your first code block to load the model:
    from joblib import load
    # Load the trained model
    model = load(r"C:\Users\Andrew\anaconda3\Lib\site-packages\mriqc_learn\data\classifier.joblib") # check your path

I get the following error. Any ideas?

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], [line 3](vscode-notebook-cell:?execution_count=5&line=3)
      [1](vscode-notebook-cell:?execution_count=5&line=1) from joblib import load
      [2](vscode-notebook-cell:?execution_count=5&line=2) # Load the trained model
----> [3](vscode-notebook-cell:?execution_count=5&line=3) model = load(r"C:\Users\Andrew\anaconda3\Lib\site-packages\mriqc_learn\data\classifier.joblib")

File [c:\Users\Andrew\anaconda3\Lib\site-packages\joblib\numpy_pickle.py:658](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:658), in load(filename, mmap_mode)
    [652](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:652)             if isinstance(fobj, str):
    [653](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:653)                 # if the returned file object is a string, this means we
    [654](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:654)                 # try to load a pickle file generated with an version of
    [655](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:655)                 # Joblib so we load it with joblib compatibility function.
    [656](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:656)                 return load_compatibility(fobj)
--> [658](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:658)             obj = _unpickle(fobj, filename, mmap_mode)
    [659](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:659) return obj

File [c:\Users\Andrew\anaconda3\Lib\site-packages\joblib\numpy_pickle.py:577](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:577), in _unpickle(fobj, filename, mmap_mode)
    [575](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:575) obj = None
    [576](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:576) try:
--> [577](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:577)     obj = unpickler.load()
    [578](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:578)     if unpickler.compat_mode:
    [579](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:579)         warnings.warn("The file '%s' has been generated with a "
    [580](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:580)                       "joblib version less than 0.10. "
    [581](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:581)                       "Please regenerate this pickle file."
    [582](file:///C:/Users/Andrew/anaconda3/Lib/site-packages/joblib/numpy_pickle.py:582)                       % filename,
...
File sklearn\tree\_tree.pyx:1418, in sklearn.tree._tree._check_node_ndarray()

ValueError: node array from the pickle has an incompatible dtype:
- expected: {'names': ['left_child', 'right_child', 'feature', 'threshold', 'impurity', 'n_node_samples', 'weighted_n_node_samples', 'missing_go_to_left'], 'formats': ['<i8', '<i8', '<i8', '<f8', '<f8', '<i8', '<f8', 'u1'], 'offsets': [0, 8, 16, 24, 32, 40, 48, 56], 'itemsize': 64}
- got     : [('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'), ('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]
jaimebarran commented 7 months ago

Hi @andrew-yian-sun !

When I install mriqc-learn (pip install mriqc-learn), the classifier.joblib file didn't come with the install (neither did production.py).

Is this how is should be? @oesteban @celprov

when I try running your first code block to load the model... I get the following error

I see you are using model = load(r"C:\Users\Andrew\anaconda3\Lib\site-packages\mriqc_learn\data\classifier.joblib") # check your path. The mmap_modeparameter in joblib load is used to control the memory-mapping behavior of the loaded object. Memory-mapping is a method used to load data into memory more efficiently, which can be useful when working with large datasets. Here's what each option means:

PS: I didn't install mriqc-learn, I forked the repo and modify it my own way.

andrew-yian-sun commented 6 months ago

Hi @jaimebarran, thanks for the tip - but it seems like either way (forking the repo, trying different options for mmap_mode) result in the same error message. I wonder if it's because the model was created with an older version of joblib but my version is too recent? My version 1.4.0

jaimebarran commented 3 months ago

Hi @andrew-yian-sun,

Yes, it seems from your error code that

The file '%s' has been generated with a joblib version less than 0.10. Please regenerate this pickle file. % filename

You can try to regenerate the .joblib file with your up-to-date python libraries running /scripts/train_model.py. You can modify the columns (= IQMs) to drop in /models/production.py/init_pipeline()/pp.DropColumns(...). This will regenerate the classifier.joblib with your libraries. Then you can try to load it to see if it works now.

I was using joblib v1.2.0 and it worked with some warnings. I updated it to v1.4.0 and it worked without warnings.

Cheers!