matsengrp / cft

Clonal family tree
5 stars 3 forks source link

minadcl tree problems #191

Closed lauradoepker closed 7 years ago

lauradoepker commented 7 years ago

@metasoarous and I have observed problems with the minadcl trees: Example: http://stoat:5052/cluster/cabd619a-5baa-3984-a533-4c7eb1b0505d

1) There are leaves that do not have names (just branches) and these seem to be leaves that are named things like "dg-280462-" with nothing after the last dash. 2) These weird leaves are likely messing up the width scale (0.44 is huge!) 3) They are also messing up the seed lineage alignment (biasing it to very weird intermediates)

These leaves may be able to be culled out using better 'health' filters? Let's investigate their current health filters please?

This is important because we will be using the minadcl trees in the future to choose antibody chains that are highly SHM'd in our high risk-high reward projects (unlike our lineage projects). This includes non-seed partitioned trees (yay!) and seed partitioned trees (like QB850 project).

metasoarous commented 7 years ago

Note that 1) is really a separate issue, and is now reflected in #192.

As @lauranoges suggests, these issues seem to be a result of bad sequences making it through the health filters. I'll take a look to see what's going on there.

metasoarous commented 7 years ago

@lauranoges Has suggested that the health metric issues appear to be more or less resolved now.

metasoarous commented 7 years ago

See a5d11d01 and issue #193.