jamesdolezal / slideflow

Deep learning library for digital pathology, with both Tensorflow and PyTorch support.
https://slideflow.dev
GNU General Public License v3.0
234 stars 39 forks source link

saving models #290

Closed sreens closed 1 year ago

sreens commented 1 year ago

Hi James,

I am trying to understand where the trained model is stored when running a slideflow clam model. In a single run which includes train_clam and evaluate_clam steps, the output probabilities are saved but I do not see any models under the /models folder in projects. Is there someplace else I should be looking or some parameter that needs to be set to save the trained model?

Thanks, Sameet

jamesdolezal commented 1 year ago

Hi Sameet - MIL models (including CLAM) should be saved in the 'mil' subfolder in the project directory. If you don't see the MIL subfolder, can you paste here the code you're running so I can investigate further?

sreens commented 1 year ago

Thanks James, I do not see the MIL folder in the project directory; the slideflow version I am running is '1.4.0.post1' in which the training block is within slideflow/clam/init.py : main(). Here I see lines that save the performance results for different folds but none that save one/all of the models. Just wanted to check if the version might have anything to do with it, before I prep the code?

sreens commented 1 year ago

Here is the code block that is being used to train claim after the project has been initialized ; [exp_name, output_dir, magnification are parameters set ahead of the run]. Let me know if you need any more information.

Train a CLAM model from the features generated.

    P.train_clam(
        exp_name=exp_name,
        pt_files=os.path.join( output_dir, "clamfeatures"),
        outcomes="Class_Label",  # column in annotations which specifies outcome
        dataset=dataset,
    )
    print("trained CLAM")

Evaluate CLAM model

P.evaluate_clam( exp_name=exp_name, pt_files=os.path.join( output_dir, "clamfeatures"), outcomes="Class_Label", filters={fold_col: ["Test"]}, tile_px=256, tile_um=magnification, ) print("Evaluated CLAM")

jamesdolezal commented 1 year ago

Hi sreens,

The Project.train_clam() function trains CLAM models using a direct port of the originally published, official code. At the time this was originally written, the official repository did not include functionality to save and reload models, which is why models were not being saved.

Since then, Slideflow has expanded MIL support beyond CLAM, including additional architectures and functionality. As of version 2.0, Slideflow now uses FastAI for training these models, utilizing the new Project.train_mil() interface. The new FastAI trainer does save models that can be later reloaded for evaluation and predictions.

The legacy trainer (Project.train_clam()) has been retained for backwards compatibility, but future projects should utilize Project.train_mil().

I don't really want to rework the legacy Project.train_clam() interface to add saving/loading model support, since we have a new preferred interface that works.

I apologize for any inconvenience. If you'd like the ability to save and reload MIL models, I would recommend upgrading to version 2.0. Version 2.0 is backwards compatible, so this shouldn't require too much work on your end, apart from learning the new Project.train_mil() interface.

sreens commented 1 year ago

Thank you for the detailed clarification @jamesdolezal I will upgrade to v2.0 for future work.