Closed attardi closed 4 years ago
Hi, thanks for reporting this issue.
This is because the sentence lengths are uniformly distributed, and sentences can only be assigned to no more than 25 buckets (as shown in the error log).
You can fix this error by setting a smaller number of buckets, e.g., --buckets=16
.
I get this error when training on the Tamil treebank:
File "/project/piqasso/tools/biaffine-parser/parser/utils/alg.py", line 18, in kmeans assert len(d) >= k, f"unable to assign {len(d)} datapoints to {k} clusters" AssertionError: unable to assign 25 datapoints to 32 clusters
With the debugger I found that in the invocation of kmeans(x, k) with len(x) = 80, k = 32 at line 10 d, indices, f = x.unique(return_inverse=True, return_counts=True)
d = tensor([ 6., 7., 8., 10., 11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 26., 27., 32., 33., 35., 45., 51.]) len(d) = 25
f =tensor([4, 1, 1, 5, 1, 7, 6, 4, 8, 8, 2, 2, 1, 8, 5, 2, 1, 2, 3, 3, 1, 2, 1, 1, 1]) len(f) = 25
With other treebanks it work fine.
Thank you for the nice and useful project.