Closed rictoo closed 1 month ago
Our original Risk factor pipeline was tightly integrated with MGH's radiology database internals (Magview SQL) and that full integration isn't part of our code-release since it's specific to MGH.
The CSV-based dataset class does not support risk factors now, but you could augment it to do so in a fork. The MGH dataset, in datasets
, shows a reference on how we did this, but the logic is a little complicated. It might be easier to do this from first principles, loading your JSON and trying to fit it into the RiskFactorVectorizer
In general, since the performance difference w and w.o risk factors is marginal, we only support image-only
versions for deployments since it's much easier to manage
Thanks for the excellent project!
How would one provide risk factor information to the model, rather than having the model infer them from the image? I could not find documentation on your Github page that explained how to do this (perhaps I'm missing something).
I am running main.py with the following arguments:
python scripts/main.py --model_name mirai_full --img_encoder_snapshot ~/scratch/dadams/mirai/snapshots/mgh_mammo_MIRAI_Base_May20_2019.p --transformer_snapshot ~/scratch/dadams/mirai/snapshots/mgh_mammo_cancer_MIRAI_Transformer_Jan13_2020.p --callibrator_snapshot ~/scratch/dadams/mirai/snapshots/callibrators/MIRAI_FULL_PRED_RF.callibrator.p --batch_size 4 --dataset csv_mammo_risk_all_full_future **--use_risk_factors --use_pred_risk_factors_if_unk --risk_factor_metadata_path ~/scratch/dadams/mirai/rf_metadata.json** --metadata_path ~/scratch/dadams/mirai/metadata_subset.csv --test --prediction_save_path ~/scratch/dadams/mirai/genbcpred_output_rftest.csv
I wasn't able to find a sample risk factor metadata file, so I tried to infer its structure from the Mirai source code (seemingly unsuccessfully) as:
which corresponds to this in metadata_subset.csv:
However, when I run main.py with the aforementioned arguments, I get this error message:
Oddly, when running main.py with the risk factor-related arguments, it seems to be suddenly try parsing the *.csv file as a JSON file. When excluding the risk factor-related arguments, the whole pipeline runs successfully.
Could anyone be so kind as to help format and provide risk factor data to Mirai successfully? :)
Thank you!