Open jjm82 opened 12 months ago
In general, tokenizer seems to be doing well. The following discussion should be useful and the author seems to be a good one to follow: https://www.kaggle.com/competitions/llm-detect-ai-generated-text/discussion/458522
An important thing the author said: "In my experiments, I've found that a significant portion of score improvement comes from tweaking the vectorization part."