As a developer I want to process credible sets derived from single cell QTLs because this will enable more precise identification of eQTLs specific to particular cell types, improving our understanding of genetic regulation and its implications in disease and development.
Some context
scRNA-seq offers significant advantages over bulk RNA sequencing by allowing the study of gene expression at the resolution of individual cells. This helps in identifying eQTLs specific to particular cell types, capturing the diversity of gene expression patterns within and between cell types, and observing temporal dynamics of gene expression changes in specific cell types.
Data availability
We have credible sets from 10studies: Aygun_2021, PISA, Walker_2019, Sun_2018, Randolph_2021, Perez_2022, OneK1K, Jerber_2021, Nathan_2022, Cytoimmgen, Kim-Hellmuth_2017.
Although initially available through the Sanger farm, the latest credible sets are public since March 31st in the FTP which is convenient because we can pull from a single location to ingest all results.
The results are served as compressed files containing summary statistics and susie results, split by datasets that represent different quantification methods.
Data inspection
Each dataset includes:
Credible sets: Each row represents a variant in a credible set and its statistics.
Bayes Factors (log10): Each row represents a variant present in any study and its Bayes Factors per credible set.
The study metadata is maintained in this [metadata table](https://github.com/eQTL-Catalogue/eQTL-Catalogue-resources/blob/1994b5ba66b2e88e89ba91f5638e40ccd3985306/data_tables/dataset_metadata_upcoming.tsv#L720).
## Tasks
- [x] Rerun the job to sync betwee the FTP folder containing all results and our Google Cloud Bucket
- [x] Identify particularities between the single cell and the bulk derived results -> None in terms of input data
- [x] Ensure the current pipeline is adaptable to the new data -> In terms of schema, I have just renamed the column that referred to the tissue to `biosample`, a more generic name that work for both levels
- [ ] Add a literature reference per study. I've opened a [PR](https://github.com/eQTL-Catalogue/eQTL-Catalogue-resources/pull/40) to the eQTL Catalogue data so that it's easier to compare between releases by looking at the respective PMIDs. Still pending to be approved
- [x] Rerun the QC to check that provided PIPs are correctly calculated
More details of the results in the [PR](https://github.com/opentargets/gentropy/pull/630).
As a developer I want to process credible sets derived from single cell QTLs because this will enable more precise identification of eQTLs specific to particular cell types, improving our understanding of genetic regulation and its implications in disease and development.
Some context
scRNA-seq offers significant advantages over bulk RNA sequencing by allowing the study of gene expression at the resolution of individual cells. This helps in identifying eQTLs specific to particular cell types, capturing the diversity of gene expression patterns within and between cell types, and observing temporal dynamics of gene expression changes in specific cell types.
Data availability
We have credible sets from 10 studies: Aygun_2021, PISA, Walker_2019, Sun_2018, Randolph_2021, Perez_2022, OneK1K, Jerber_2021, Nathan_2022, Cytoimmgen, Kim-Hellmuth_2017.
Although initially available through the Sanger farm, the latest credible sets are public since March 31st in the FTP which is convenient because we can pull from a single location to ingest all results.
The results are served as compressed files containing summary statistics and susie results, split by datasets that represent different quantification methods.
Data inspection
Each dataset includes:
QTD000564.lbf_variable.txt.gz
-RECORD 0------------------------------------- molecular_trait_id | ENSG00000182362 region | chr21:45286342-47286342 variant | chr21_45287004_G_A chromosome | 21 position | 45287004 lbf_variable1 | -1.9982103201721 lbf_variable2 | -0.0529486746200503 lbf_variable3 | -0.0497269339290685 lbf_variable4 | -0.0248066525814772 lbf_variable5 | -0.00783420001459678 lbf_variable6 | -0.0017594415663138 lbf_variable7 | -0.000282958881470341 lbf_variable8 | -5.33928563872799e-06 lbf_variable9 | 2.34509269450012e-05 lbf_variable10 | 1.49791941113087e-05