matsengrp / cft

Clonal family tree
5 stars 3 forks source link

Stricter requirements for timepoint data #268

Closed eharkins closed 5 years ago

eharkins commented 5 years ago

Just creating this to have a record of my discussion with @metasoarous and @psathyrella about how we want to be sure that all sequences have timepoint information, unless they are explicitly not expected to have it, because they meet one of these cases:

If a sequence doesn't have timepoint info for one of these reasons, we should raise an error because to quote @psathyrella :

while in general lots of samples won't have timepoint info, if there's a sample that does have timepoint info but it's somehow missing for some sequences, that's extremely bad

This is especially bad if there is no error / warning and we lose this information somewhere in processing without knowing it.

eharkins commented 5 years ago

https://github.com/matsengrp/cft/pull/269

metasoarous commented 5 years ago

If I had gone through my GH notifications from bottom to top I would have seen this before commenting on #269.

To summarize, I don't want to assume that there might not be other reasons for there to be missing timepoint info for just a set of sequences (such as inclusion of synthetic/fake sequences). And it should certainly be possible to put in non-simulated data which does not have timepoint information.

eharkins commented 5 years ago

If I had gone through my GH notifications from bottom to top I would have seen this before commenting on #269.

To be fair I created this issue just to track the PR, so they were essentially simultaneous

This makes sense, I've fixed all the things you pointed out in the PR and will push after testing

eharkins commented 5 years ago

See #269