theislab / bindome

Assembling of biomolecule binding data (TF/RNA) from genomics databases for ML-downstream.
MIT License
1 stars 1 forks source link

RNA datasets - list and priority. #7

Open ilibarra opened 2 years ago

ilibarra commented 2 years ago

@mhorlacher Following up on the previous discussion, adding RNA datasets is expected to increase usability and exploratory insights based on downstream analyses, of this repository and connection with others for modeling.

Some examples of RNA datasets by priority IMO are.

This one is potentially great, but raw data does not seem to be available, and one should go per study:

Please feel free to list additional ones. The idea is to get 3-6 into functions and h5ad files, following general conventions (sequence data + counts available). Examples here. https://github.com/theislab/bindome/blob/main/bindome/datasets/selex.py#L142 https://github.com/theislab/bindome/blob/main/bindome/datasets/probound.py#L16

Looking forward to keeping the discussion on this. Thanks!