sanderlab / CellBox

CellBox: Interpretable Machine Learning for Perturbation Biology
MIT License
54 stars 22 forks source link

Usage discussion #17

Open abhinavgudipati opened 3 years ago

abhinavgudipati commented 3 years ago
Screenshot 2021-02-18 at 5 01 59 PM

I have tried to run the command on Binder and got an output as follows.

Sincere apologies for a dumb doubt, but like I am not exactly able to understand how we are able to train a model without exactly giving any precise outputs?

All inputs are welcome..

Thanking you :)

@cannin @DesmondYuan @judyueshen

abhinavgudipati commented 3 years ago

Looking forward to your prompt response! thanking you!

DesmondYuan commented 3 years ago

Hi @abhinavgudipati, thanks for testing it out. Can you elaborate a bit on what you mean by without giving outputs? The binder example is using a sample data under the ./data folder if that helps.

abhinavgudipati commented 3 years ago

Hello! I am revisiting this issue again!, I am finding it a bit hard to understand why it is taking almost forever for the model to terminate running on binder!

Screenshot 2021-04-09 at 11 33 03 PM

I would like to know a possible alternative method here!

Also, I am struggling to understand the variables at play over here.

I have read the research paper referenced in this repo, but would like to know in brief on what we are trying to obtain towards the end, as I its taking forever on binder to terminate this program.

DesmondYuan commented 3 years ago

We will get this fixed. But meanwhile please note that the binder only gives an interface for test running - please download the codes from our latest release to run with full functionality.

@cannin and @judyueshen can you please take a look at the issue of binder?)

yumengyang commented 2 years ago

I want to ask how can I test the package on my own data. I have a fixed training and testing datasets, instead of splitting them in ratio, where should I change the code?

DesmondYuan commented 2 years ago

I want to ask how can I test the package on my own data. I have a fixed training and testing datasets, instead of splitting them in ratio, where should I change the code?

@restiso7788 Thanks for the question! Given the current implementation, the easiest way is to adapt the LOO structure. The idea is to assume you have 2 drugs - i.e., drug_train and drug_test and leave-one-drug-out would then give you a nice train/test separation. You can refer to this factory function https://github.com/sanderlab/CellBox/blob/93df2d094a04ec75197bf10099e8198f7de79185/cellbox/cellbox/dataset.py#L127-L150

and this demo input file data/loo_label.csv for LOO runs.

However, the best way is to add a new function in the factory that specifies train and test indices directly as an input file. You are more than welcome to create a pull request if you want to contribute that feature!