kaizhang / SnapATAC2

Single-cell epigenomics analysis tools
https://kzhang.org/SnapATAC2/
231 stars 26 forks source link

"obs_names" error in snapatac2.tl.add_cor_scores #240

Open MubasherMohammed opened 9 months ago

MubasherMohammed commented 9 months ago

Hi and thanks for developing this very useful tool! when i tried to use snapatac2.tl.add_cor_scores(network, peak_mat= data) it gives me error of "'NoneType' object has no attribute 'obs_names'" data:Anndata scATAC processed with snapatac2, the previous step of the network works fine. created nodes and edges. any idea how to workaround that? snapatac2: i used pip install snapatac2[all] for installation. many thanks in advance.

kaizhang commented 9 months ago

It will be helpful if you can provide me with a small example h5ad file and a few lines of code to reproduce this error. Thank you!

MubasherMohammed commented 9 months ago

thanks for reply! I shared with you subset of my h5ad object processed with snapatac2 https://drive.google.com/drive/folders/1-Wv6njVe8hXney4QpPuBivi7SzMUcN8D?usp=sharing my code: %%time snap.tl.macs3(data, groupby = 'Astro_GEX_SNctrl') %%time peaks = snap.tl.merge_peaks(data.uns['macs3'], snap.genome.hg19) peaks.head()

%%time peak_mat = snap.pp.make_peak_matrix(data, use_rep=peaks['Peaks']) peak_mat

%%time marker_peaks = snap.tl.marker_regions(peak_mat, groupby='Astro_GEX_SNPD', pvalue=0.01)

motifs = snap.tl.motif_enrichment(motifs = snap.datasets.cis_bp(unique=True), regions = marker_peaks, genome_fasta=snap.genome.hg19, )

all_regions = ['chr1:1618389-1618890', 'chr1:1851469-1851970', 'chr1:4139898-4140399', 'chr1:4821118-4821619', 'chr1:6732994-6733495', 'chr1:840499-841000']

net = snap.tl.init_network_from_annotation(all_regions, anno_file =snap.genome.hg19 , upstream = 250000, downstream= 250000, id_type = 'gene_name', coding_gene_only = True) net = snap.tl.add_cor_scores(net,gene_mat=None, peak_mat=count, select=None, overwrite=False)

`--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[138], line 1 ----> 1 net = snap.tl.add_cor_scores(net,gene_mat=None, peak_mat=count, select=None, overwrite=False)

File ~/miniconda3/envs/epi/lib/python3.9/site-packages/snapatac2/tools/_network.py:137, in add_cor_scores(network, gene_mat, peak_mat, select, overwrite) 134 from tqdm import tqdm 136 key = "cor_score" --> 137 if list(peak_mat.obs_names) != list(gene_mat.obs_names): 138 raise NameError("gene matrix and peak matrix should have the same obs_names") 139 if select is not None:

AttributeError: 'DataFrame' object has no attribute 'obs_names'`

I see the network nodes and edges are created with genes. but error with 'obs_names' in addition to when running net = snap.tl.add_tf_binding(net, genome_fasta = snap.genome.hg19, motifs = motifs) `2024-02-15 20:13:44 - INFO - Fetching 48 sequences ... 2024-02-15 20:13:44 - INFO - Searching for the binding sites of 10 motifs ... 0%| | 0/10 [00:00<?, ?it/s]

AttributeError Traceback (most recent call last) Cell In[106], line 1 ----> 1 net = snap.tl.add_tf_binding(net, genome_fasta = snap.genome.hg19, motifs = motifs)

File ~/miniconda3/envs/epi/lib/python3.9/site-packages/snapatac2/tools/_network.py:261, in add_tf_binding(network, motifs, genome_fasta, pvalue) 259 logging.info("Searching for the binding sites of {} motifs ...".format(len(motifs))) 260 for motif in tqdm(motifs): --> 261 bound = motif.with_nucl_prob().exists(sequences, pvalue=pvalue) 262 if any(bound): 263 name = motif.id if motif.name is None else motif.name

AttributeError: 'str' object has no attribute 'with_nucl_prob'`

many thanks for help!

kaizhang commented 9 months ago

You need to specify gene_mat as well in snap.tl.add_cor_scores. And the obs_names between gene_mat and peak_mat must be the same. I'll write a tutorial for this later.

MubasherMohammed commented 9 months ago

thanks for the swift reply! my issue is the gene_mat i have not the same cell barcodes as peak_mat. is there a workaround for the way that I can add from GEX anndata with not the same obs_names? or just only use peak_mat to add the score? thanks again..

kaizhang commented 9 months ago

Short answer is yes, if you have a sensible grouping that can be apply to both RNA and ATAC, e.g., group the cells according to cell types. More details will be provided once I finish the tutorial.

MubasherMohammed commented 9 months ago

okay, thanks for the advice and looking fwd to the tutorial.

Regards

TingTingShao commented 7 months ago

Will the tutorail will be published soon:D?