Open fedarko opened 3 years ago
Other example in the sheep gut dataset of where this problem becomes annoying: a component is almost completely a chain of bubbles, but there's this weird bubble in the middle that is missed out on. (At the bottom of this comment: zoomed out screenshot above, zoomed in screenshot below.) I'm pretty sure the Onodera algorithm should be detecting this (but not 100% confident tho), so I suspect the problem is this issue.
I guess the main trouble here is how exactly to "define" this behavior. Like, if a chain is the boundary node of a bubble on the "left" side, and its rightmost node is a node, then we can say "remove the final node in this chain and make it the new source node of the bubble, then extend the chain to include this new bubble". And if the rightmost node was another bubble, then we can just hit the boundary node duplication yoinky sploinky. (No one is reading these, right?)
There are almost certainly gonna be weird corner cases (e.g. what if the chain's leftmost node is another chain? --> that shouldn't ever happen, right?; what if the chain only contains two nodes? --> then it's still a chain of just 1 node and a bubble, that's fine, right?) but we can tackle those piecemeal.
The current code is too restrictive, leading us to miss out on some bubbles due to the end and/or start nodes already being tagged in a chain.
Use case: component 75 of the first biofilm graph, contig_0000016197 should form a bubble to contig_0000011283 but since both are in chains this doesn't work.
The way to handle this is detecting if the start or end node is a chain, and if so then attempting to "split up" the chain so that its start or end node is now the end or start node of the current bubble. This might be annoying when the chain contains non-basic-node stuff. But basically, we might end up in a situation where splitting the chain completely removes it from the graph, for example if it's just a chain of two nodes...? Hm.