Closed jenellewallace closed 3 months ago
This is the list of peaks that were excluded:
> excluded_peaks = setdiff(all_peaks_cleaned$range_id, rownames(macs2_counts))
> excluded_peaks
[1] "chr1_2244X2270_random-16876-17376" "chr1_1791X1813_random-31640-32140" "chr1_2295X2322_random-31028-31528" "chr1_2244X2270_random-2265-2765"
[5] "chr1_2244X2270_random-7894-8394" "chr1_2244X2270_random-13297-13797" "chr1_2244X2270_random-20176-20676" "chr1_1791X1813_random-24407-24907"
[9] "chr1_1791X1813_random-20165-20665" "chr1_2244X2270_random-4330-4830" "chr1_2244X2270_random-8561-9061" "chr1_2244X2270_random-19298-19798"
[13] "chr5_2859X2896_random-139511-140011" "chr5_2859X2896_random-133900-134400" "chr5_2859X2896_random-88847-89347" "chr5_2859X2896_random-72947-73447"
[17] "chr5_2859X2896_random-64104-64604" "chr5_2859X2896_random-47989-48489" "chr5_2859X2896_random-47161-47661" "chr5_2859X2896_random-12460-12960"
[21] "chr5_1808X1830_random-2653-3153" "chr5_1808X1830_random-2064-2564" "chr5_2859X2896_random-113066-113566" "chr8_2057X2081_random-13865-14365"
[25] "chr8_2057X2081_random-10998-11498" "chr8_2057X2081_random-7945-8445" "chr8_2057X2081_random-11764-12264" "chr8_2057X2081_random-12431-12931"
[29] "chr7_1604X1624_random-8651-9151" "chr7_1604X1624_random-12606-13106" "chr7_1604X1624_random-13988-14488" "chr7_1604X1624_random-11363-11863"
[33] "chr7_1604X1624_random-25349-25849" "chr7_1604X1624_random-13176-13676" "chr7_1604X1624_random-11939-12439" "chr7_1604X1624_random-16427-16927"
[37] "chr7_1604X1624_random-7978-8478" "chr7_1604X1624_random-6789-7289" "chr7_1604X1624_random-10118-10618" "chr7_1604X1624_random-17397-17897"
[41] "chr7_1604X1624_random-28097-28597" "chr7_1604X1624_random-7361-7861" "chr1_2244X2270_random-3711-4211" "chr1_2244X2270_random-7590-8090"
[45] "chr1_2244X2270_random-14545-15045" "chr1_2244X2270_random-17086-17586" "chr1_2244X2270_random-22125-22625" "chr1_2244X2270_random-24775-25275"
[49] "chr1_1791X1813_random-24507-25007" "chr1_2244X2270_random-19250-19750" "chr1_2244X2270_random-8614-9114" "chr1_2295X2322_random-30844-31344"
[53] "chr1_1791X1813_random-20241-20741" "chr1_2244X2270_random-13173-13673" "chr1_2244X2270_random-1978-2478" "chr1_1791X1813_random-33068-33568"
[57] "chr1_2244X2270_random-20182-20682" "chr1_2244X2270_random-4440-4940" "chr1_2244X2270_random-12644-13144" "chr1_2244X2270_random-2532-3032"
[61] "chr5_2859X2896_random-131444-131944" "chr5_2859X2896_random-83068-83568" "chr5_2859X2896_random-29745-30245" "chr5_2859X2896_random-27762-28262"
[65] "chr5_2859X2896_random-113086-113586" "chr4_1536X1556_random-10954-11454" "chr4_1536X1556_random-9316-9816" "chr4_1536X1556_random-13230-13730"
[69] "chr8_2057X2081_random-10705-11205" "chr8_2057X2081_random-8037-8537" "chr8_2057X2081_random-11464-11964" "chr8_2057X2081_random-12372-12872"
[73] "chr17_2196X2221_random-27780-28280" "chr7_1604X1624_random-15644-16144" "chr7_1604X1624_random-25508-26008" "chr7_1604X1624_random-17347-17847"
[77] "chr7_1604X1624_random-16369-16869" "chr7_1604X1624_random-8002-8502" "chr7_1604X1624_random-14194-14694" "chr7_1604X1624_random-6933-7433"
[81] "chr7_1604X1624_random-10913-11413" "chr7_1604X1624_random-9045-9545" "chr7_1604X1624_random-13230-13730" "chr7_1604X1624_random-11744-12244"
[85] "chr7_1604X1624_random-10038-10538" "chr7_1604X1624_random-12506-13006"
So it appears that the peaks that could not be parsed correctly (underscore in the chromosome name) were excluded. If I want to keep these peaks (in this example, these peaks are on standard chromosomes in other species in my dataset), how would I adjust the chromosome naming?
Any peaks that are on chromosomes that are not present in the fragment file will not be included in the resulting matrix. We could include these in the matrix and fill with zeros, but I am hesitant to do that in case it leads to misleading results when for some technical reason a chromosome is not present in a fragment file. In my opinion the current behaviour is better, as it reflects the reality: these peaks are simply not present in the data, so we don't provide any quantification for them. I agree that we can be clearer about this in the documentation for FeatureMatrix, and I will update it accordingly
Makes sense, thank you!!
For my dataset I've found that the dimensions of my feature matrix do not match the length of the granges object I used to make it. I cannot figure out why based on the documentation - is any filtering done at this step behind the scenes? I did notice that I had some ranges in my granges object that were exact duplicates (I created the peak set with some custom code because I am interested in consensus peaks across multiple species and didn't realize I had that issue) which I thought might cause this problem, but I removed the duplicates and still the numbers don't quite match. Thanks for any insight!