Closed one-matrix closed 1 year ago
@one-matrix No this will not affect training accuracy.
After the sequence is predicted, we strip all gaps from both the prediction and the label before passing it to an alignment-based loss function to calculate loss.
@danielecook hi,danielecook. strand is different from pw ,base and 0 forward 1 reverse,but the ModifiedOnDeviceEmbedding class mask the 0 as any other.That might not be such a good idea.
` if params.use_strand: strand_vocab_size = params.STRAND_MAX + 1 self.strand_embedding_layer = ModifiedOnDeviceEmbedding( vocab_size=strand_vocab_size, embedding_width=params['strand_hidden_size'], name='strand_embedding', )
class ModifiedOnDeviceEmbedding(layers.OnDeviceEmbedding): """Subclass of OnDeviceEmbedding, init similar to EmbeddingSharedWeights."""
def init(self, vocab_size, embedding_width, **kwargs):
# tensorflow_models/official/legacy/transformer/embedding_layer.py
super().__init__(
vocab_size,
embedding_width,
initializer=tf.random_normal_initializer(
mean=0.0, stddev=embedding_width**-0.5
),
scale_factor=embedding_width**0.5,
**kwargs,
)
def call(self, inputs):
embeddings = super().call(inputs)
mask = tf.cast(tf.not_equal(inputs, 0), embeddings.dtype)
embeddings *= tf.expand_dims(mask, -1)
return embeddings
`
I'm not sure I fully understand what the issue is. We encode strand as 0=unknown, 1=forward, 2=reverse
class Strand(int, enum.Enum):
UNKNOWN = 0
FORWARD = 1 # read.is_reverse == False
REVERSE = 2 # read.is_reverse == True
These values are then embedded.
@danielecook Thanks to danielecook, "UNKNOWN = 0" can solve my doubts. This picture is a little misleading.
I see - thanks for pointing this out. I'll see if we can get the figure updated.
the example I have a question. When subreads " TGACA" and label "TGACA" are aligned, the first character has a space, result in no alignment. Will this affect the training accuracy?