rajewsky-lab / novosparc

BSD 3-Clause "New" or "Revised" License
125 stars 41 forks source link

Reference atlas creation #63

Open liliay opened 2 years ago

liliay commented 2 years ago

Hi guys,

I am trying to generate proper files in order to create a reference atlas from antibody staining of genes/TFs in the drosophila optical lobe. In the paper I did not found any indication about how the dge.txt and the geometry.txt are obtained. From my understanding I need to generate both, since it is the combination of subset of genes expression x location in the target space. How are we supposed to generate these files ?

Thank you for your help, and for developing this very useful tool,

Best,

Lilia

MalteMederacke commented 2 years ago

In the paper, for drosophila, they use a very extensiv reference atlas from the Berkley lab. Sadly, their data is not available anymore, since they took down their website last year or so. But that wouldn't be helpful anyways, as they put in a lot of effort to manually curate and image thousands of embryos for the marker genes used to get an average spatial expression profile. First of all, I don't know if Antibody staining are particular useful, as you are looking probably at single cell expression data. I don't know you system, but location of expression isn't equal to protein location, if you have shuttling or diffusion. Quantifications are not useful either, as signal of protein abundance does not need to correlate with abundance of mRNA. You should use image data based on expression (eg. FISH, smFISH or ISH). The last is not quantitative! For the Geometry you need to average the shape of your system to one target space. Maybe just use the outlines of a representative lobe or create a shape based on the average of all of them. Then you just define a grid that covers your geometry for example with a binary image.

Then you need to map your expression data on this shape. Eg. measure the proportions and positions of a stripe or area and project that on your previous defined geometry. I would start with binary information, thus assign each location in your target space a 1 if it expresses your gene and a 0 if not. If you have quantitative data, you can of course use a float between 0 and 1 after normalising for all genes.

Maybe that helps?

Best,

Malte