Right now, as far as I can tell the only thing to do is to tag them with NCBI's
"unidentified" taxid, number 32644,
which does not sit inside "cellular organisms", of course.
This has the consequence of completely hosing the clade-level taxonomic identifiers for every clade containing an unidentified sequence, because the MRCA of any one of these and something labeled is going to be the root.
We need to do better.
The best option would be to be able to specify some sequences with an "unidentified" taxid, and then they would get the NoTax label. The MRCA of this label and any x would then be x. I think that we might have to have a special lowest rank for it, or better, no rank at all.
The first step is to think about what these sorts of choices would lead to for MRCA labeling and classification.
Right now, as far as I can tell the only thing to do is to tag them with NCBI's "unidentified" taxid, number 32644, which does not sit inside "cellular organisms", of course.
This has the consequence of completely hosing the clade-level taxonomic identifiers for every clade containing an unidentified sequence, because the MRCA of any one of these and something labeled is going to be the root.
We need to do better.
The best option would be to be able to specify some sequences with an "unidentified" taxid, and then they would get the NoTax label. The MRCA of this label and any x would then be x. I think that we might have to have a special lowest rank for it, or better, no rank at all.
The first step is to think about what these sorts of choices would lead to for MRCA labeling and classification.