Closed adamnovak closed 2 months ago
What I want to do here is:
vg filter
to let me intersect a BED with CHM13-simulated read refpos annotationsFinding HG002 centromere reads might be harder because I'd want to look at reads simulated from the HG002 centromere, not reads with a CHM13 refpos in the centromere. So I'd need to get a BED of HG002 centromeres, and reads with refpos positions on it as well as CHM13.
The HG002 1.0.1 assemblies in the hub at https://genome.ucsc.edu/cgi-bin/hgGateway?hgHub_do_redirect=on&hgHubConnect.remakeTrackHub=on&hgHub_do_firstDb=1&hubUrl=https://research.nhgri.nih.gov/CustomTracks/T2T_hubs/HG002_Q100/hub.txt do also have cenSat tracks with bigBeds, like https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/HG002/assemblies/annotation/centromere/hg002v1.0.1_v2.0/hg002v1.0.1.cenSatv2.0.noheader.bb
I think that's the right assembly we actually simulated from?
I did a first pass using 10k HiFi reads in https://ucsc-gi.slack.com/archives/CJ2EHEH1A/p1719953970881309?thread_ts=1719905671.084099&cid=CJ2EHEH1A and I concluded that while there's something going on on chrY in CHM13, there's not obviously a huge pile of wrongly-mapped or unmapped centromeric reads.
But we should actually pull out simulated-from-centromere reads for CHM13 for R10 and HiFi, and map them, and see of centromere reads are worse than other reads, and how good they are overall.
We should also see if CHM13 centromere reads are notably better than HG002 centromere reads, since that would suggest that adding more centromeres to the graph is actually going to help us.