kundajelab / chrombpnet

Bias factorized, base-resolution deep learning models of chromatin accessibility (chromBPNet)
https://github.com/kundajelab/chrombpnet/wiki
MIT License
112 stars 30 forks source link

Question about utilizing multiple chromosome folds #114

Closed KyleFerchen closed 1 year ago

KyleFerchen commented 1 year ago

In the tutorial, there is mention of creating multiple folds for splitting the chromosomes into fitting, validation, and testing subsets. All of the functions in the tutorial use fold_0, however, in general practice would you build multiple models across multiple folds?

If so, how would you integrate the results from different models? Would you recommend averaging the scores from different models built from different folds?

If it would not be general practice to try to integrate multiple folds, how should we consider biases across different chromosomes? If a particular chromosome is not used for fitting the data, yet a major transcription factor only acts on that chromosome and not the others, wouldn't using the outputs from a single fold miss the activity of that factor?

panushri25 commented 1 year ago

Hello @KyleFerchen

Good question, I have a made a note in the FAQ to address this. Let me know if this clarifies your question. https://github.com/kundajelab/chrombpnet/wiki/FAQ

KyleFerchen commented 1 year ago

Yes, that answers my question. Thank you!

yyoshiaki commented 1 year ago

Hi,

Just to confirm, you said that the results for each fold should be averaged. Is it correct that the prediction is made even for the chromosomes used for train and valid?

For example, in the case of fold_0 below, I think it would be possible to use fold_0 to predict chr1, 3, and 6, and fold_1 to predict chr2, 8, 9, and 16 and so on, to exclude the leak of train and valid.

fold_1.json

{
    "test": [
        "chr2",
        "chr8",
        "chr9",
        "chr16"
    ],
    "valid": [
        "chr12",
        "chr17"
    ],
    "train": [
        "chr1",
        "chr3",
        "chr4",
        "chr5",
        "chr6",
        "chr7",
        "chr10",
        "chr11",
        "chr13",
        "chr14",
        "chr15",
        "chr18",
        "chr19",
        "chr20",
        "chr21",
        "chr22",
        "chrX",
        "chrY"
    ]
}

Best, Yoshi