theislab / AutoGeneS

MIT License
54 stars 8 forks source link

The hierarchical AutoGeneS #8

Open Chuang1118 opened 3 years ago

Chuang1118 commented 3 years ago

Dear Author,

Thanks for this new API.

As you mentioned in your paper, the paragraph of Hierarchical optimization for highly correlated cell types. "we ran AutoGeneS separated CD4+ and CD8+ T cells ......" as AutoGeneS* I would like to run it on my data, It seems highly correlated in my reference i.e. subtype of memory B v.s. naive B cell. With low correlation Pareto optimal solutions, I found very few markers. I have about 100,000 cells and over 30 cell types as Reference initial, I had regroup some cell types for easy to deconvolution, but it doesn't work very well.

Now I want to use AutoGenS*, would you share your codes ?

Very nice feature selection method using GA. Thanks in advance Chuang

lila167 commented 3 years ago

Hi Chuang,

Thanks for your interest. 30 cell types are quite a lot. If there is a high correlation between the cell types, no method can handle it. I recommend to group as many sub-cell types as you can. Then run autogenes with different number of genes (300-400-500) and compare the results. For AutoGeneS*, I ran autogenes for correlated cell types (e.g. memory B anf naive B) individually with very few genes (~10-20) and concatenated them with the autogenes's results applied on the whole dataset. Does it make sense?

Best, Hana

Chuang1118 commented 3 years ago

Hello Hana,

Thank you for your suggestion, this make sense for me. Here, I have big celltypes. In my opinion, all the method can handle big celltypes, it cannot represent the powerful AutoGeneS. I know autoGeneS use only 400 genes, it is huge advantage, but now I am interesting in the part of quality result deconvolution, whatever how many genes participate to regression 400 or 1000.

Now I observed between the pareto solutions with low correlation, it loss the important biological markers that have very difference mean expression compared to others celltypes when I increase number of generation(i.e. 5000 to 8000). This observation in 20 celltypes I want to prediction.

I don't have bulk sort or flow cytometry support and my result not robust. How I can valid AutoGeneS prediction? Add synthetic bulk in bulk dataset ? manually or tools special, any suggestion ?

Par example, I am in situation figure below as the starting point, toward more fine subtype Bcells. 1/ Can I believe the result of AutoGeneS the start point ? I waiting for change the parameters, the result no change too much to sure the result robust. now I observed nuSVR and nnls are very difference. 2/ If I trust the result of start point, I continue .... , how I can valid each step. 3/ In which situation I need stop ? 4/ Or I just believe output AutoGeneS, because AutoGeneS gain the BencheMarking ?

I am beginning in deconvolution technique. I dont konw maximal power deconvolution tools, if we counter cell subtype, we must stop?

I'm looking foward to your reply my naive questions. Best, Chuang

image

image

Chuang1118 commented 3 years ago

Hello Hana,

I have a question about AutoGeneS*. I don't know how to add genes additional. After :

ag.optimize(ngen=5000,nfeatures=400,seed=0,mode='fixed')

Each pareto solution is a set 400 genes. Then

ag.select(index=0)

Each pareto index has sum of true egal 400. The result stock in class ag, I want to add 1 genes (i.e. CD79B) for a set 401 genes, How I can do this ?

Best, Chuang

lila167 commented 3 years ago

Hi Chuang,

Unfortunately we don't support adding genes to the deconvolution, however we will consider it for future. For the moment, you can run the regressions individually after concatenating the selected genes by autogenes and your genes. You can take a look at this code: https://github.com/theislab/AutoGeneS/blob/master/autogenes/interface.py

Just search for nusvr and nnls.

Hope this helps

kapoormuskan commented 8 months ago

Hi Hana,

Do you have a documentation for AutoGeneS+?

I optimized my single cell data of 13 cell types and found highly correlated cell types. I ran the optimization on those cell types thus adding 10 more genes to the signature matrix- I am not sure how to deconvolute the bulk data now. I was trying something along these lines: [ag.AutoGeneS(data=signature_matrix_np), ag.deconvolve(numeric_bulk.T, model='nusvr')] but the ag is picking up vales from the new optimization which is on 2 cell types only..