veghp / pyVDJ

V(D)J sequencing data analysis
GNU General Public License v3.0
6 stars 2 forks source link

Find VDJ usages and CDR3 for cells in specific leiden clusters #6

Closed taopeng1100 closed 3 years ago

taopeng1100 commented 3 years ago

Can you help me in defining VDJ gene usages and CDR3 seq for cells in specific leiden clusters?

Thx!

Tao

veghp commented 3 years ago

The tutorial has a section about retrieving CDR3 sequences: Public, private and condition-specific clonotypes

meta = 'donor'  # here you specify the category for clustering
adata = pyvdj.stats(adata, meta)
cdr3 = adata.uns['pyvdj']['stats'][meta]['cdr3']

Details of the stats() function here. Then you can query the clusters, for example set(cdr3['c1']). Alternatively, you can just take the original csv file with the VDJ data, and filter for the cells of interest. This requires no pyVDJ.

For gene usage, the scirpy package is probably a better choice. Another option is Immunarch. For example, geneUsage().

taopeng1100 commented 3 years ago

meta = 'adata.obs[["leiden"]=="4"]' # here you specify the category for clustering adata = pyvdj.stats(adata, meta) cdr3 = adata.uns['pyvdj']['stats'][meta]['cdr3']

I go the error: KeyError Traceback (most recent call last) in 1 meta = 'adata.obs[["leiden"]=="4"]' # here you specify the category for clustering ----> 2 adata = pyvdj.stats(adata, meta) 3 cdr3 = adata.uns['pyvdj']['stats'][meta]['cdr3'] ~\Anaconda3\lib\site-packages\pyvdj\stats.py in stats(adata, meta) 50 stats_dict['meta'] = meta 51 ---> 52 n_cells = adata.obs.groupby(meta).size() 53 n_hasvdj = adata.obs.groupby(meta)['vdj_has_vdjdata'].apply(sum) 54 stats_dict['cells'] = [n_cells, n_hasvdj] ~\Anaconda3\lib\site-packages\pandas\core\frame.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed) 5799 axis = self._get_axis_number(axis) 5800 -> 5801 return groupby_generic.DataFrameGroupBy( 5802 obj=self, 5803 keys=by, ~\Anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in init(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, mutated) 401 from pandas.core.groupby.grouper import get_grouper 402 --> 403 grouper, exclusions, obj = get_grouper( 404 obj, 405 keys, ~\Anaconda3\lib\site-packages\pandas\core\groupby\grouper.py in get_grouper(obj, key, axis, level, sort, observed, mutated, validate) 598 in_axis, name, level, gpr = False, None, gpr, None 599 else: --> 600 raise KeyError(gpr) 601 elif isinstance(gpr, Grouper) and gpr.key is not None: 602 # Add key to exclusions KeyError: 'adata.obs[["leiden"]=="4"]'

Tao

From: Peter Vegh notifications@github.com Sent: Wednesday, October 14, 2020 12:19 PM To: veghp/pyVDJ pyVDJ@noreply.github.com Cc: Peng, Tao tpeng@fredhutch.org; Author author@noreply.github.com Subject: Re: [veghp/pyVDJ] Find VDJ usages and CDR3 for cells in specific leiden clusters (#6)

The tutorialhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_veghp_pyVDJ_tree_master_tutorials&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=j6EgtEBZ-6pbDONgnwVzuTHJ6L-gWcikckOhZCwVjPc&m=4FxwjIGkovrYhcSF_Q3eiQpulIQ-2PDnXHF0BJYY9xs&s=a6RNm645Ytr7LaQBrJ35XWonrNhNbodOFqmOh9fcABk&e= has a section about retrieving CDR3 sequences: Public, private and condition-specific clonotypes

meta = 'donor' # here you specify the category for clustering

adata = pyvdj.stats(adata, meta)

cdr3 = adata.uns['pyvdj']['stats'][meta]['cdr3']

Details of the stats() function herehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_veghp_pyVDJ_blob_760c80fb7c7ecad18cb78b0a66d42193e8609667_pyvdj_stats.py-23L40&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=j6EgtEBZ-6pbDONgnwVzuTHJ6L-gWcikckOhZCwVjPc&m=4FxwjIGkovrYhcSF_Q3eiQpulIQ-2PDnXHF0BJYY9xs&s=SsHEpW6Y3_e6Aryu9R0HHyMXzBSbUj_Y-CuI-k2XlW4&e=. Then you can query the clusters, for example set(cdr3['c1']). Alternatively, you can just take the original csv file with the VDJ data, and filter for the cells of interest. This requires no pyVDJ.

For gene usage, the scirpy package is probably a better choice. Another option is Immunarchhttps://urldefense.proofpoint.com/v2/url?u=https-3A__immunarch.com_index.html&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=j6EgtEBZ-6pbDONgnwVzuTHJ6L-gWcikckOhZCwVjPc&m=4FxwjIGkovrYhcSF_Q3eiQpulIQ-2PDnXHF0BJYY9xs&s=wd9PSxLKOtPSYZMSWiLG9q5C-InwNm0BHTZOyjcFuxw&e=. For example, geneUsage()https://urldefense.proofpoint.com/v2/url?u=https-3A__immunarch.com_reference_geneUsage.html&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=j6EgtEBZ-6pbDONgnwVzuTHJ6L-gWcikckOhZCwVjPc&m=4FxwjIGkovrYhcSF_Q3eiQpulIQ-2PDnXHF0BJYY9xs&s=wqd2rT4AGmTeOHwZAv1FvmhlfwqeNQyBYxAF11X2-5M&e=.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_veghp_pyVDJ_issues_6-23issuecomment-2D708608220&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=j6EgtEBZ-6pbDONgnwVzuTHJ6L-gWcikckOhZCwVjPc&m=4FxwjIGkovrYhcSF_Q3eiQpulIQ-2PDnXHF0BJYY9xs&s=Q0hgeaK0zJl2T2mvNvbRietsngzIk54AVNXNWarvHec&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ALCZ3CMCN3KGJCU6LG5Y3C3SKX2QPANCNFSM4SQ7NBSQ&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=j6EgtEBZ-6pbDONgnwVzuTHJ6L-gWcikckOhZCwVjPc&m=4FxwjIGkovrYhcSF_Q3eiQpulIQ-2PDnXHF0BJYY9xs&s=VLYmdzjnclMxCzD8JzKzVSanVn5H14hkeGizkI3gP64&e=.

veghp commented 3 years ago

I believe it should be just meta = 'leiden' -- in my example, donor was the name of the category (column). Then you should be able to list CDR3 in 4 with cdr3['4'])

taopeng1100 commented 3 years ago

I think I did not explain well. I have one sample. After UMAP clustering analysis, I have single cells into 12 clusters (leiden cluster). I like to see what are the clonotypes in cluster 4 with VDJ gene usages and CD3 seq.

Tao

From: Peter Vegh notifications@github.com Sent: Wednesday, October 14, 2020 1:05 PM To: veghp/pyVDJ pyVDJ@noreply.github.com Cc: Peng, Tao tpeng@fredhutch.org; Author author@noreply.github.com Subject: Re: [veghp/pyVDJ] Find VDJ usages and CDR3 for cells in specific leiden clusters (#6)

I believe it should be just meta = '4' -- in my example, donor was the name of the category (column). Then you should be able to list CDR3 in 4 with cdr3['4'])

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_veghp_pyVDJ_issues_6-23issuecomment-2D708629922&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=j6EgtEBZ-6pbDONgnwVzuTHJ6L-gWcikckOhZCwVjPc&m=4NeHfivR7HSjKlUSYHnf6jgHhL0dAuxE8iYyCNB6-V8&s=V8vL-dnIPiiBDEomx9NqjSDifty3iRVOBQSyLv5UEwE&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ALCZ3COCRNUTICPJBSW4GJLSKX7YRANCNFSM4SQ7NBSQ&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=j6EgtEBZ-6pbDONgnwVzuTHJ6L-gWcikckOhZCwVjPc&m=4NeHfivR7HSjKlUSYHnf6jgHhL0dAuxE8iYyCNB6-V8&s=yNiKhqGvQBD1mFq5NnFd9glB-ltbX0wturS5JqzUIuk&e=.

veghp commented 3 years ago

I think you will need to get the list of cells in cluster 4, then filter the VDJ dataframe using these cells. Then you can have a look at the data/columns. I hope something like this helps (not tried):

cluster4_cells = adata.obs.loc[adata.obs["leiden"].isin(["4"])]['vdj_obs'] .tolist()
    # where vdj_obs is the columnname you generated for loading in the data
vdjdf = adata.uns['pyvdj']['df']
vdj_cluster4 = vdjdf.loc[vdjdf['barcode_meta'].isin(cells)]
vdj_cluster4['cdr3']

However, it is probably better to use another up-to-date package. I'll archive this project because it is not maintained anymore.

taopeng1100 commented 3 years ago

Appreciate your help!

From: Peter Vegh notifications@github.com Sent: Thursday, October 15, 2020 7:28 AM To: veghp/pyVDJ pyVDJ@noreply.github.com Cc: Peng, Tao tpeng@fredhutch.org; Author author@noreply.github.com Subject: Re: [veghp/pyVDJ] Find VDJ usages and CDR3 for cells in specific leiden clusters (#6)

I think you will need to get the list of cells in cluster 4, then filter the VDJ dataframe using these cells. Then you can have a look at the data/columns. I hope something like this helps (not tried):

cluster4_cells = adata.obs.loc[adata.obs["leiden"].isin(["4"])]['vdj_obs'] .tolist()

# where vdj_obs is the columnname you generated for loading in the data

vdjdf = adata.uns['pyvdj']['df']

vdj_cluster4 = vdjdf.loc[vdjdf['barcode_meta'].isin(cells)]

vdj_cluster4['cdr3']

However, it is probably better to use another up-to-date package. I'll archive this project because it is not maintained anymore.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_veghp_pyVDJ_issues_6-23issuecomment-2D709363615&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=j6EgtEBZ-6pbDONgnwVzuTHJ6L-gWcikckOhZCwVjPc&m=WoZ1se9Va1b77d4dmWoNdJkvs0AnUhBlVJR1gze0rtI&s=uCnajwIHwGk8qrgnsTIilN8r7yslmdDjclJaloYunzw&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ALCZ3CJLX3NQOFPRYS2JP2DSK4BIBANCNFSM4SQ7NBSQ&d=DwMCaQ&c=eRAMFD45gAfqt84VtBcfhQ&r=j6EgtEBZ-6pbDONgnwVzuTHJ6L-gWcikckOhZCwVjPc&m=WoZ1se9Va1b77d4dmWoNdJkvs0AnUhBlVJR1gze0rtI&s=H65ULcA_axXmKl-sv104Uz4CaUObjNfB97N-b0W1RKM&e=.