jeromekelleher / sc2ts

Infer a succinct tree sequence from SARS-COV-2 variation data
MIT License
4 stars 3 forks source link

Add internal sample stats to TreeInfo: #151

Open hyanwong opened 1 year ago

hyanwong commented 1 year ago
s = np.isin(ts.samples(), ts.edges_parent)
print(f"{len(s)} samples, of which {sum(s)} ({sum(s)/len(s)*100:.1f}%) are parents")

Long ARG: 657239 samples, of which 86577 (13.2%) are parents Wide ARG: 1265685 samples, of which 140336 (11.1%) are parents

Or as a fraction of the total number of internal nodes:

s = np.isin(np.unique(ts.edges_parent), ts.samples())
print(f"{len(s)} parent nodes, of which {sum(s)} ({sum(s)/len(s)*100:.1f}%) are samples")

Long ARG: 212569 parent nodes, of which 86577 (40.7%) are samples Wide ARG: 327998 parent nodes, of which 140336 (42.8%) are samples

Note my slack comment:

Got to be a bit careful with phrasing because a sample could be internal in one tree and a tip in another (probably rare though)