maqin2001 / qubic-r-package

Other
0 stars 0 forks source link

Step 3. marker gene assignment #9

Open PegasusAM opened 6 years ago

PegasusAM commented 6 years ago

in this step two input files are required: the bicluster result and cell type classification

  1. considering each bicluster, we have created a dense graph that each cell link with every other cell (edges) in the bicluster. We now store all genes covered in this bicluster to every edge. Do the same thing to all biclusters to obtain a gene list related to the edge. the weight of each gene is 1.
  2. merge genes stored in the same edge in different biclusters, and sum the weight.
  3. split cells by cell types. we create another dense graph for each cell type and pair the merged genes with the edges. sum the weight 4.1 also merge genes related to the same cell in the same cell type and sum the weight. There are cell-specific marker genes. 4.2 merge all genes in the same cell type (new dense graph) and sum the weight. These are the cell-type specific marker genes
  4. generate the marker gene list. e.g:

[[1]] [[1]] $cell_type [1] 1 [[1]] $cells [1] C1 C2... [[1]] $C1_gene [1] G1 G2 G3... [[1]] $C2_gene [1] G1 G3 G4... [[1]] $cell_type_specific gene [1] G1 G2 G3 G4...

[[2]] [[2]] $cell_type [2]] 2 ... ...

(problem is how to show the weight of each gene)

  1. restore the real gene and cell names (not sure if put step 6 before 5 would be better?)
  2. export the result.
zy26 commented 6 years ago

HI @PegasusAM! Can you provide some example inputs and outputs for this function?

PegasusAM commented 6 years ago

@zy26 I attached a here to show the workflow. See if any step not clear.

gene assignment

PegasusAM commented 6 years ago

when merge gene weight, we consider cell types. C1 and C2 are in cell_type_1 thus only consider gene weight on e1,2 C3, C4 and C5 are in cell_type_2, thus for C3, we consider gene weight on e3,4 and e3,5. And the same to C4 and C5. Because e4,5 does not exist, so for C4 the gene weight is just on e3,4, and for C5 the gene weight is just on e3,5.

zy26 commented 6 years ago

In a bicluster, assuming there are n cells belongs to one cell type, each cell in the bicluster has a weight of n(n-1)/2 and the total weight of the cells of that cell type is n2(n-1)/2. So if there is a lot of cells of the same cell type in a certain bicluster, that bicluster is more likely to dominate.

maqin2001 commented 6 years ago

I think this is a good case, isn’t it?

Get Outlook for iOShttps://aka.ms/o0ukef


From: zy26 notifications@github.com Sent: Friday, May 4, 2018 2:17:30 AM To: maqin2001/qubic-r-package Cc: Subscribed Subject: Re: [maqin2001/qubic-r-package] Step 3. marker gene assignment (#9)

In a bicluster, assuming there are n cells belongs to one cell type, each cell in the bicluster has a weight of n(n-1)/2 and the total weight of the cells of that cell type is n2(n-1)/2. So if there is a lot of cells of the same cell type in a certain bicluster, that bicluster is more likely to dominate.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/maqin2001/qubic-r-package/issues/9#issuecomment-386522630, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABarDQkFfd7nTpPervZgDj8nJed2T_4jks5tvACKgaJpZM4TemQK.

PegasusAM commented 6 years ago

结果完全不对。可能是我上次那个图没画清楚,这里我用公式来表述

image

① 当Ci,Cj和gm同时出现在一个bicluster中时,给这个gene对于Ci,Cj这个cell pair一个weight 1,否则weight为0. 即同一个gk对于不同的i,j有不同的weight。 ② 叠加所有K个bicluster中的gi,j,k ③ 对于任意Ci,gm的weight是同一cell type里所有与Ci相连的cell pair的weight之和。w(gi,m)就是cell-specific gene weight,构成的matri即output_1 ④ 对于任意cell type (CT),gm的weight是同一cell type里全部cell pair的weight之和。w(gCT,m) 就是cell-type-specific gene weight,构成的matrix即output_2

PegasusAM commented 6 years ago

测试结果中只有44个genes 分配给了10个cell,明显差的太多。且同一个cell中所有gene的值都一样。部分如表(我移除了全0的行列): image

maqin2001 commented 6 years ago

请张禹查看一下这个细节,相应修改gene这方面的功能。

安骏最好能给出一个例子的正确答案。这样张禹调试的时候也知道是不是调对了。

Qin Ma, Ph.D. Assistant Professor Department of Plant Science Department of Mathematics and Statistics 254D Northern Plains Biostress lab (SNP) South Dakota State University Brookings, SD, 57007 Lab: http://bmbl.sdstate.edu

On Sun, May 6, 2018 at 8:29 AM, Anjun Ma notifications@github.com wrote:

测试结果中只有44个genes 分配给了10个cell,明显差的太多。且同一个cell中所有gene的值都一样。部分如表(我移除了全0的行列): [image: image] https://user-images.githubusercontent.com/31615033/39673753-8847933c-5107-11e8-8301-f0c07b443ff0.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/maqin2001/qubic-r-package/issues/9#issuecomment-386879483, or mute the thread https://github.com/notifications/unsubscribe-auth/ABarDWUcAPtpgQ8_VxIaXah3Q49L5vWrks5tvvqggaJpZM4TemQK .