Data-efficient and weakly supervised computational pathology on whole slide images - Nature Biomedical Engineering
GNU General Public License v3.0
1.07k
stars
350
forks
source link
Fix: Slide ids turned into floats in split csv when names consist of only number #228
Open
ff98li opened 7 months ago
Summary of the Issue
train
,val
, andtest
splits introduceNaN
values when these splits are concatenated into a dataframe bysave_splits()
.NaN
values to floats due to the lack ofNaN
rep in integer columns in Pandas.ValueError
as shown in the screenshot will occur https://github.com/mahmoodlab/CLAM/blob/3f875f77465b410d260f2afcfaea608a9d6ddbca/datasets/dataset_generic.py#L247Proposed fix
save_splits
to prevent unintended type conversion.dtype=object
inGeneric_WSI_Classification_Dataset
.get_split_from_df()
, cast the dtype of the corresponding split column to match that ofself.slide_data['slide_id']
.This happened when I was working with my own task's dataset csv. I can provide the csv file to reproduce this bug if needs be.