stuart-lab / signac

R toolkit for the analysis of single-cell chromatin data
https://stuartlab.org/signac/
Other
330 stars 88 forks source link

Integrating ATAC-seq data from multiple species with a consensus peak set #1755

Closed jenellewallace closed 3 months ago

jenellewallace commented 3 months ago

Hello, I am trying to integrate ATAC-seq data from human, chimp, and macaque. I have defined a consensus peak set in all three species using my own pipeline, so now I have consensus peaks that are named Peak_1, Peak_2, etc in all species which correspond to the location of the peak in each species' own coordinates. I would appreciate some help in figuring out how to set this up in Signac as the integration vignettes only seem to cover integrating datasets from a single species. Is there a way to name the peaks with their consensus names? I assume this would be necessary in order to merge the object and use the integration pipeline. When I tried this I got an error (Peak_1, etc names are stored in all_peaks$name):

cells = colnames(multi3)
macs2_counts <- FeatureMatrix(fragments = frags_list,features = all_peaks,cells = cells)
rownames(macs2_counts) = all_peaks$name 
multi3[["peaks_consensus"]] <- CreateChromatinAssay(counts = macs2_counts,fragments = frags_list[1], annotation = annotation,min.cells=-1,min.features = -1, cells = cells)
Error in .get_data_frame_col_as_numeric(df, granges_cols[["start"]]) : 
  some values in the "start" column cannot be turned into numeric values

I was able to create the ChromatinAssay when I left the rownames as is (chromosome coordinates), but this means the same peak will have a different name in each species and then I'm not sure how to do the integration. Any advice would be greatly appreciated!