jozhang97 / MutateEverything

62 stars 5 forks source link

Unbalanced validation set data #2

Closed jiaweiguan closed 10 months ago

jiaweiguan commented 10 months ago

When I was observing the dataset, I found that cdna test dataset is all double mutation. Is there a test dataset for single mutation?

jozhang97 commented 10 months ago

Right, we test on the cdna double mutations and single mutations separately.

We do have a cDNA test set for single mutations but did not prioritize it in this work... is it something of interest?

jiaweiguan commented 10 months ago

In my understanding, this test dataset is used for model selection and evaluation. To have a more comprehensive evaluation and ensure good performance on the actual test set, s669, it is important to have this specific dataset with single mutations. If you could provide this portion of the dataset, I would be extremely grateful. Your work is excellent, and I hope to engage in more discussions and collaborations on GitHub in the future.

jozhang97 commented 10 months ago

Thanks for your interest. I added this test set to the other datasets.

You can find it at https://drive.google.com/drive/folders/1psp5LBnAWpwkzGtsWD9SJgauo8xX_eaK?usp=sharing

jiaweiguan commented 10 months ago

Wow, thank you for your help.