mahmoodlab / CLAM

Data-efficient and weakly supervised computational pathology on whole slide images - Nature Biomedical Engineering
http://clam.mahmoodlab.org
GNU General Public License v3.0
975 stars 329 forks source link

The code really needs fixing #232

Closed onbigion13 closed 4 months ago

onbigion13 commented 4 months ago

You code, especially the 3 dataset classes in dataset_generic.py, they never work when one want to get dataset splits from ids. Mayby the reason is the messy nested inheritance of the classes. How could one lets a class inherits from another class and the later class neet the instantiation of the former class?

I guess you just only run your code with specified .csv file of each split. I tried longer than half year to reproduce your code, however, you really just abandon the reproduction of your code by other researchers. If you welcome researchers to reimplement your code, how could you design you datasets classes like such a chaos?

Why must you split train, val, test splits with function among dataset class??? why not just randomly split them with pytorch or random module???

And if you want to resplit multiple whole splits, then why must you generate a iterative generator in class , instead of resplit folds upn the whole object of the class??

I can believe that I waste more than half year to study an reimplement you code and want to generate some innovation upon you jobs and want to cite you jobs in my research. While you just write your dataset class in a way that prevents anyone to reproduce or improve upon it, it's so sad and annoying.

fedshyvana commented 4 months ago

I agree with you that the design and implementation of the dataset class can be improved. Please note however:

onbigion13 commented 4 months ago

Thanks for your intime reply. I understand that your implemented is done years ago, and appreciate your efforts of letting your code be open to community. Your code about preprocessing the .svs slides really help a lot.

While the class about datasets for pack extracted features into Dataset really did not work when one chooses to make folds by ids, instead of .cvs files, which you adopted in your implementation. The main trouble is from the nested inheritance of multiple classes, especially when one wants to create datasets splits from the Generic_MIL_Dataset. It shows error that the arguments assigned to child class don't exist in parent class, maybe the reason lies in that you nested the inheritance of classed and did not declare the super method in Generic_Split. Even I added the super method in the constructor method of Generic_Split, more errors about member properties arised. So I believe that maybe it's your decision after consideration, and so you did not declare the super method.

And I have implemented the dataset with common ways with my own codes, and conduct the multiple folds' creation outside the class, with Random and Sub-Dataset class. Any way, thanks for your efforts for creating and sharing the repos.