r3fang / SnapATAC

Analysis Pipeline for Single Cell ATAC-seq
GNU General Public License v3.0
301 stars 125 forks source link

differential peaks by cluster #74

Closed ellenhong1 closed 5 years ago

ellenhong1 commented 5 years ago

Are there any snapATAC tools that I can use to pull out differential peaks that identify each cluster after running findDAR? I've successfully used runHomer and identified motifs that are enriched inside the DARs, but I would like to identify the peaks that are at the essence of each cluster and use runHOMER on those per cluster. Thank you!

r3fang commented 5 years ago

Hi yes,

x.sp@peak[idy,] in which idy is the differential peak index. Let me know if this does not work

-- Rongxin Fang, Ren Lab Ludwig Cancer Research Bioinformatics Ph.D. Student University of California, San Diego

On Aug 27, 2019, at 6:10 AM, ellenhong1 notifications@github.com wrote:

Are there any snapATAC tools that I can use to pull out differential peaks that identify each cluster after running findDAR? I've successfully used runHomer and identified motifs that are enriched inside the DARs, but I would like to identify the peaks that are at the essence of each cluster and use runHOMER on those per cluster. Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/r3fang/SnapATAC/issues/74?email_source=notifications&email_token=ABT6GG5LDT6NR3ELPGVVF33QGURTLA5CNFSM4IQEJGWKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HHUXP2Q, or mute the thread https://github.com/notifications/unsubscribe-auth/ABT6GG2JXASXAU3RUV64AY3QGURTLANCNFSM4IQEJGWA.

ellenhong1 commented 5 years ago

Thanks for the prompt response- when I type that in, I get this error:

"Error: subscript contains out-of-bounds indices"

Would you have any suggestions on why that might be and how to fix it?

r3fang commented 5 years ago

If you can run Homer one differential peaks, this should not be wrong. Are you sure idy is the index of differential peaks ?

-- Rongxin Fang, Ren Lab Ludwig Cancer Research Bioinformatics Ph.D. Student University of California, San Diego

On Aug 27, 2019, at 12:40 PM, ellenhong1 notifications@github.com wrote:

Thanks for the prompt response- when I type that in, I get this error:

"Error: subscript contains out-of-bounds indices"

Would you have any suggestions on why that might be and how to fix it?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/r3fang/SnapATAC/issues/74?email_source=notifications&email_token=ABT6GGYJNB3AMPARBX3TOBTQGV7LLA5CNFSM4IQEJGWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5I4J5Q#issuecomment-525452534, or mute the thread https://github.com/notifications/unsubscribe-auth/ABT6GG4PJ45BJ5SIXTO5EJLQGV7LLANCNFSM4IQEJGWA.

ellenhong1 commented 5 years ago

Yes, I can run Homer on the differential peaks- I basically followed the tutorial online (https://github.com/r3fang/SnapATAC/blob/master/examples/10X_P50/README.md) to the letter. So I did just try "x.sp@peak[idy_C2,]" and that worked and gave me this output:

GRanges object with 9059 ranges and 0 metadata columns: seqnames ranges strand

[1] chr1 669123-669320 * [2] chr1 1068915-1069034 * [3] chr1 1365658-1366055 * [4] chr1 1660031-1660561 * [5] chr1 2005047-2005657 * ... ... ... ... [9055] s37d5 32871719-32871926 * [9056] s37d5 34736201-34736293 * [9057] GL000193.1 12043-12198 * [9058] GL000193.1 56965-57166 * [9059] GL000193.1 66222-66549 * ------- seqinfo: 57 sequences from an unspecified genome; no seqlengths" I also did a quick > str(x_sp_at_peak_idy_C2) Formal class 'GRanges' [package "GenomicRanges"] with 7 slots ..@ seqnames :Formal class 'Rle' [package "S4Vectors"] with 4 slots .. .. ..@ values : Factor w/ 57 levels "chr1","chr10",..: 1 2 3 4 5 6 7 8 9 10 ... .. .. ..@ lengths : int [1:33] 852 376 361 440 398 179 125 225 285 43 ... .. .. ..@ elementMetadata: NULL .. .. ..@ metadata : list() ..@ ranges :Formal class 'IRanges' [package "IRanges"] with 6 slots .. .. ..@ start : int [1:9059] 669123 1068915 1365658 1660031 2005047 2541771 3269137 5682558 5968250 5975907 ... .. .. ..@ width : int [1:9059] 198 120 398 531 611 256 192 241 808 469 ... .. .. ..@ NAMES : NULL .. .. ..@ elementType : chr "ANY" .. .. ..@ elementMetadata: NULL .. .. ..@ metadata : list() ..@ strand :Formal class 'Rle' [package "S4Vectors"] with 4 slots .. .. ..@ values : Factor w/ 3 levels "+","-","*": 3 .. .. ..@ lengths : int 9059 .. .. ..@ elementMetadata: NULL .. .. ..@ metadata : list() ..@ seqinfo :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots .. .. ..@ seqnames : chr [1:57] "chr1" "chr10" "chr11" "chr12" ... .. .. ..@ seqlengths : int [1:57] NA NA NA NA NA NA NA NA NA NA ... .. .. ..@ is_circular: logi [1:57] NA NA NA NA NA NA ... .. .. ..@ genome : chr [1:57] NA NA NA NA ... ..@ elementMetadata:Formal class 'DataFrame' [package "S4Vectors"] with 6 slots .. .. ..@ rownames : NULL .. .. ..@ nrows : int 9059 .. .. ..@ listData : Named list() .. .. ..@ elementType : chr "ANY" .. .. ..@ elementMetadata: NULL .. .. ..@ metadata : list() ..@ elementType : chr "ANY" ..@ metadata : list() Is this what you were referring to? And if so, how would I go about pulling out the differential peaks by cluster identified previously in the pipline? Thank you!
r3fang commented 5 years ago

Hi,

if you can extract the peaks, can you convert it to a data.frame and write it down to the disk? as.data.frame(x.sp@peak[idy_C2,]), i don’t fully understand your question, if you can be more specific, I can provide more advice

-- Rongxin Fang Ph.D. Student, Ren Lab Ludwig Institute for Cancer Research University of California, San Diego

On Aug 28, 2019, at 6:39 AM, ellenhong1 notifications@github.com wrote:

Yes, I can run Homer on the differential peaks- I basically followed the tutorial online (https://github.com/r3fang/SnapATAC/blob/master/examples/10X_P50/README.md https://github.com/r3fang/SnapATAC/blob/master/examples/10X_P50/README.md) to the letter. So I did just try "x.sp@peak[idy_C2,]" and that worked and gave me this output:

GRanges object with 9059 ranges and 0 metadata columns: seqnames ranges strand

[1] chr1 669123-669320 [2] chr1 1068915-1069034 [3] chr1 1365658-1366055 [4] chr1 1660031-1660561 [5] chr1 2005047-2005657 ... ... ... ... [9055] s37d5 32871719-32871926 [9056] s37d5 34736201-34736293 [9057] GL000193.1 12043-12198 [9058] GL000193.1 56965-57166 [9059] GL000193.1 66222-66549

seqinfo: 57 sequences from an unspecified genome; no seqlengths"

I also did a quick

str(x_sp_at_peak_idy_C2) Formal class 'GRanges' [package "GenomicRanges"] with 7 slots ..@ seqnames :Formal class 'Rle' [package "S4Vectors"] with 4 slots .. .. ..@ values : Factor w/ 57 levels "chr1","chr10",..: 1 2 3 4 5 6 7 8 9 10 ... .. .. ..@ lengths : int [1:33] 852 376 361 440 398 179 125 225 285 43 ... .. .. ..@ elementMetadata: NULL .. .. ..@ metadata : list() ..@ ranges :Formal class 'IRanges' [package "IRanges"] with 6 slots .. .. ..@ start : int [1:9059] 669123 1068915 1365658 1660031 2005047 2541771 3269137 5682558 5968250 5975907 ... .. .. ..@ width : int [1:9059] 198 120 398 531 611 256 192 241 808 469 ... .. .. ..@ NAMES : NULL .. .. ..@ elementType : chr "ANY" .. .. ..@ elementMetadata: NULL .. .. ..@ metadata : list() ..@ strand :Formal class 'Rle' [package "S4Vectors"] with 4 slots .. .. ..@ values : Factor w/ 3 levels "+","-","*": 3 .. .. ..@ lengths : int 9059 .. .. ..@ elementMetadata: NULL .. .. ..@ metadata : list() ..@ seqinfo :Formal class 'Seqinfo' [package "GenomeInfoDb"] with 4 slots .. .. ..@ seqnames : chr [1:57] "chr1" "chr10" "chr11" "chr12" ... .. .. ..@ seqlengths : int [1:57] NA NA NA NA NA NA NA NA NA NA ... .. .. ..@ is_circular: logi [1:57] NA NA NA NA NA NA ... .. .. ..@ genome : chr [1:57] NA NA NA NA ... ..@ elementMetadata:Formal class 'DataFrame' [package "S4Vectors"] with 6 slots .. .. ..@ rownames : NULL .. .. ..@ nrows : int 9059 .. .. ..@ listData : Named list() .. .. ..@ elementType : chr "ANY" .. .. ..@ elementMetadata: NULL .. .. ..@ metadata : list() ..@ elementType : chr "ANY" ..@ metadata : list()

Is this what you were referring to? And if so, how would I go about pulling out the differential peaks by cluster identified previously in the pipline?

Thank you!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/r3fang/SnapATAC/issues/74?email_source=notifications&email_token=ABT6GG2IVQMCBWHCPQFT6K3QGZ5ZLA5CNFSM4IQEJGWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5LE3FA#issuecomment-525749652, or mute the thread https://github.com/notifications/unsubscribe-auth/ABT6GG7MUINPOGG62VCFYQ3QGZ5ZLANCNFSM4IQEJGWA.

ellenhong1 commented 5 years ago

Hi,

Yes- I just realized that anything in x.sp@peak[idy_C2,] is a differential peak that defines that cluster and I have successfully converted it to a data.frame. Thank you so much for your help!

r3fang commented 5 years ago

no problem!

-- Rongxin Fang Ph.D. Student, Ren Lab Ludwig Institute for Cancer Research University of California, San Diego

On Aug 28, 2019, at 3:05 PM, ellenhong1 notifications@github.com wrote:

Hi,

Yes- I just realized that anything in x.sp@peak[idy_C2,] is a differential peak that defines that cluster and I have successfully converted it to a data.frame. Thank you so much for your help!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/r3fang/SnapATAC/issues/74?email_source=notifications&email_token=ABT6GG3KPXHEQEENAPU3QZ3QG3ZCRA5CNFSM4IQEJGWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5MTRXY#issuecomment-525940959, or mute the thread https://github.com/notifications/unsubscribe-auth/ABT6GG6MBMUTGGSL74A3NS3QG3ZCRANCNFSM4IQEJGWA.