neurogenomics / rare_disease_celltyping

Code, data and results associated with the "Rare diseases cell-typing" project.
6 stars 0 forks source link

Adjust 'Recurrent Neisserial infections' network plot #73

Open bschilder opened 3 months ago

bschilder commented 3 months ago

Tasks

Difficulty: low

Screenshot 2024-06-15 at 13 21 54

bschilder commented 3 months ago

Determine why certain genes / diseases are missing

This was actually due to a mapping issue between Disease IDs and the Disease Names. The HPO does not currently provide all of the disease names in their annotation files, so I was previously trying to hack together a solution to this.

Fortunately, HPO just released a new API to access this sort of info. I wrote a new function to query their API thousands of times to collect this information (parallelised to speed this up). I've cached this ID-name map and have included a function that uses it to better map the disease names in the gene association data.

Omit 'Recurrent Neisserial infections' as a node in the network plot

I think this is justified in this particular example, as the addition of the previously missing diseases and genes makes the network plot more complex. Removing RNI as a node reduces the number of plotted connections and makes it a bit easier to read. I also added connection from the disease to the gene (in addition from the disease to the cell type) to make the connection between "C7 deficiency" and the C7 gene more obvious.

Preview:

Screenshot 2024-06-15 at 13 32 25
bschilder commented 3 months ago

Plot specificity of each gene as bar plot

I think this might work better as a heatmap underneath the network plot image

bschilder commented 3 months ago

include variant-level information

"We should be using this figure on a place to explain in more detail how and why all this works. We should show a gnomad-style plot for one of the genes showing where pathogenic variants with known clinical effects associated with the phenotype are located. We should also show the confidence level with which gene is associated with the trait."

While I agree this could be helpful, gathering all of the information requested here is non-trivial. Will circle back to this later.

bschilder commented 1 month ago

include variant-level information

This is unfortunately beyond the scope of this study. It is not at all a trivial task to incorporate variant-level information. When I did this for a single gene in a single phenotype, it took me weeks to exact the information. This has not yet been automated.

Including variants in the figure also gives the false impression that we performed variant-level analyses.

bschilder commented 6 days ago
Screenshot 2024-09-24 at 13 31 55

Also: