keras-team / keras-nlp

Modular Natural Language Processing workflows with Keras
Apache License 2.0
758 stars 227 forks source link

Implement compute_output_spec() for tokenizers with vocabulary. #1523

Closed briango28 closed 5 months ago

briango28 commented 5 months ago

Small fix for Issue 1522

Implements the same compute_output_spec() method for BytePairTokenizer, WordPieceTokenizer, and SentencePieceTokenizer.

briango28 commented 5 months ago

Previous version used keras.KerasTensor which apparently did not exist in keras v.2. Updated to use keras.Input instead.

briango28 commented 5 months ago

Ran format.sh. I was working behind a MITM proxy without a proper linux environment, and had to resort to manual copying which turned out to be rather unwieldy. Hopefully will pass tests now.

briango28 commented 5 months ago

Applied above discussions. The function now looks like this:

class TokenizerWithVocabulary:
    def compute_output_spec(self, input_spec) -> keras.KerasTensor:
        return keras.KerasTensor(
            input_spec.shape + (self.sequence_length,), dtype=self.compute_dtype
        )
mattdangerw commented 5 months ago

Thank you!