Closed wawpaopao closed 7 months ago
Hi, the Zenodo dataset is a ZIP file. You could download, and unzip it to reveal the directory structure. In the directory train_gru/ there are subdirectories for every fold. Within those, there are two directories data0/ and data1/ corresponding to human and mouse. Within them, there is a table genes.tsv with the gene names and half-life measurements, and there are tfrecords with the exact RNA sequence that we used and its half-life measurement.
thanks! so what's the difference between f0_c0 and f0_c1, the training process?
and mapping = {"A": 0, "C": 1, "G": 2, "T": 3} right?
The 'c' numbers are just technical replicates using the same train/test split. So f0_c0 and f0_c1 use the same exact train/test split, but are different stochastic training runs from random initializations.
Yes, the nucleotide-integer mappings are correct.
I have noticed the Zenodo dataset, but i don't know how to use. I just want the rna sequence data and the label.