Any string that isn't a multiple of 4 causes an assert failure at line 548 in models.py
"assert char_encoding.shape[1] % self.conv.stride[0] == 0"
stride is intialised to config.downsampling_rate (4) in modeling_canine.py in transformers lib.
Sample code causing assert failure (length of input string is 35):
from wtpsplit import WtP
wtp = WtP("wtp-canine-s-12l")
wtp.split("This is a test This is another test", lang_code="en")
Sample code that works (with added full-stop that makes the length of input string to become 36):
from wtpsplit import WtP
wtp = WtP("wtp-canine-s-12l")
wtp.split("This is a test This is another test.", lang_code="en")
Hi,
Any string that isn't a multiple of 4 causes an assert failure at line 548 in models.py "assert char_encoding.shape[1] % self.conv.stride[0] == 0"
stride is intialised to config.downsampling_rate (4) in modeling_canine.py in transformers lib.
Sample code causing assert failure (length of input string is 35): from wtpsplit import WtP wtp = WtP("wtp-canine-s-12l") wtp.split("This is a test This is another test", lang_code="en")
Sample code that works (with added full-stop that makes the length of input string to become 36): from wtpsplit import WtP wtp = WtP("wtp-canine-s-12l") wtp.split("This is a test This is another test.", lang_code="en")