Open xu-song opened 8 hours ago
maxLengthTip
Maximum chunk length
string length
max-chunk-size
[!IMPORTANT] Please review the checklist below before submitting your pull request.
dev/reformat
cd web && npx lint-staged
Summary
maxLengthTip
:Maximum chunk length
is ambiguous. It could be confused withstring length
.max-chunk-size
: With default gpt2-tokenizer, 1000 tokens is roughly equivalent to 400 CJK characters. It is not enough in most cases.Screenshots
Checklist
dev/reformat
(backend) andcd web && npx lint-staged
(frontend) to appease the lint gods