simonwm / tacco

TACCO: Transfer of Annotations to Cells and their COmbinations
BSD 3-Clause "New" or "Revised" License
44 stars 1 forks source link

max_annotation parameter not considered during observation splitting #9

Closed PietroAndrei closed 9 months ago

PietroAndrei commented 1 year ago

I am using TACCO to analyse some Visium samples, and after annotating the reference cell types I have tried to split the gene expression counts of Visium spots across the contributing cell types with the tc.tl.split_observations() function. Although during the annotation step I set the max_annotation parameter to 7, the Visium counts are then splitted by considering all the possible cell types defined in the single cell reference (51). Is there a a way to preserve the max_annotation information even during the observation splitting step?

Thank you for your help!

JWatter commented 1 year ago

Thank you for the observation! I tried to reproduce your case and found that if multi_center is non-trivial, then the split does not honor the max_annotation=True setting. This will be fixed in the next release.

If you use a non-trivial multi_center setting, then this can be fixed by adding the following code between your tc.tl.annotate and tc.tl.split_observations calls. It basically propagates the max_annotation mapping to the multi_center levels.

# assuming you have your data in adata, the annotation in an .obsm key 'cell_type' and the reconstuction in an .obsm, .varm, and .uns key 'cell_type_mc10'
sub_anno = adata.obsm['cell_type_mc10']
sub_map = adata.uns['cell_type_mc10']
sub_anno *= (adata.obsm["cell_type"] != 0).reindex(columns=sub_anno.columns.map(sub_map)).to_numpy()
sub_anno /= sub_anno.sum(axis=1).to_numpy()[:,None]

I hope this helps!

PietroAndrei commented 1 year ago

Thanks a lot for the explanation and the code! I am actually not using the multi_center parameter, as I am considering a cell type annotation which is highly granular already. I might have found the origin of my problem anyway (which was simpler than expected) . By using the annotate function with the following settings

anno_vis = tc.tl.annotate(visium, ref, 'cellType', result_key='TACCO', reconstruction_key='rec',max_annotation= int(f"{max_anno}"))

the results with the applied max_annotation parameter are stored in the .obsm['TACCO'] slot, which would not be considered during the observation splitting. I solved the issue by substituting .obsm['rec'] with .obsm['TACCO'] before applying tc.tl.split_observations()

#Switch default reconstructed matrix with obsm['TACCO'] (where max_annotation has been applied)
anno_vis.obsm['rec'] = anno_vis.obsm['TACCO']
split_vis = tc.tl.split_observations(visium, 'rec', map_obs_keys=True, result_key='cellType_split')
JWatter commented 1 year ago

Thank you for the update and for posting your solution! That makes a lot of sense. I was trying it with the default setting of reconstruction_key which basically defaults to result_key - except for non-trivial multi_center. This way the .obsm for reconstruction_key and for result_key were already identical in my case and the issue did not show up.

I'm glad it is solved now. I'm keeping this issue open until the general fix is released.

JWatter commented 9 months ago

The new TACCO version (v0.3.0) fixes this issue. See release notes.