Clarification on the Selection of Target Tissues in the Training Script

rvinas / HYFA

Hypergraph Factorisation

MIT License

21 stars 4 forks source link

Clarification on the Selection of Target Tissues in the Training Script #9

Closed Shorzinator closed 5 months ago

Shorzinator commented 5 months ago

Hello, I have been working with the HYFA (Hypergraph Factorisation for Multi-Tissue Gene Expression Imputation) model and analyzing the provided scripts. I noticed that the original training script includes only four target tissues ('lung', 'pancreas', 'heart_atrial', 'esophagus_muscularis') for validation.

Could you please provide clarification on why only these four target tissues were selected for the original script? Was there a specific rationale behind this choice, such as computational constraints, data quality considerations, or a particular focus of the study?

Thank you!

rvinas commented 5 months ago

Hi @Shorzinator, thank you for your interest in our work! This is correct, the original training script uses 4 target tissues for validation. This was an arbitrary decision, we chose these tissues because they have good coverage in GTEx, but we could have chosen others. Best wishes

Shorzinator commented 5 months ago

Thank you for your response. Following up on that, From the paper, I gathered that the training principle involved is that of LOTO-CV (leave one tissue out) - correct me if I am wrong. If that is the case, then shouldn't there exist a loop in the training workflow that takes one tissue as target at a time and iteratively trains, updating the weights along the way in the model? Or go as far as saving seperate models for seperate tissue, which obviously would have its drawbacks.

rvinas commented 5 months ago

Hi @Shorzinator, the idea of doing leave one tissue out is certainly interesting, but generalising to unseen tissues is hard and this is not what we did. Instead, we split data into train, validation, and test sets by donor. All tissues were seen at train time.