molgenis / capice

GNU Lesser General Public License v3.0
22 stars 10 forks source link

Request for Testing and Training Datasets for Capice Method #180

Closed GriffithLin closed 1 week ago

GriffithLin commented 5 months ago

Hello,

I hope this message finds you well. I am currently working on a project where we are exploring different methods, including Capice. To ensure a fair and thorough comparison of our method with Capice, I kindly request access to the testing and training datasets used in the Capice method.

Having access to these datasets would greatly assist us in accurately assessing the performance and capabilities of both methods. Any assistance you can provide in this matter would be highly appreciated.

Thank you for your attention to this request.

Warm regards, LinMing

dennishendriksen commented 5 months ago

Dear LinMing,

You can find the datasets used as assets attached to the GitHub releases, e.g. https://github.com/molgenis/capice/releases/tag/v5.1.2. https://github.com/molgenis/capice-resources describes the procedure to train models on our compute clusters. You might want to take a look at the validation results of our models as well: https://github.com/molgenis/capice-resources/tree/main/validation/5.1.2/5.1.2-v1.

Best regards, @dennishendriksen

GriffithLin commented 5 months ago

Hi Dennis, Could you kindly confirm if train_test.vcf.gz serves as the training set and validation.vcf.gz as the testing set for capice? Thank you in advance for your assistance. Best regards, Lin Ming

dennishendriksen commented 5 months ago

Hello Lin Ming,

train_test.vcf.gz is used during model creation. validation.vcf.gz is used as a holdout data set to create plots https://github.com/molgenis/capice-resources/tree/main/validation/5.1.2/5.1.2-v1 detailing how the model performs. Please see https://github.com/molgenis/capice-resources?tab=readme-ov-file#usage for details.

Greetings, @dennishendriksen