In the formula for calculating Yule's K, N (ntokens) is substracted from the sum in the numerator. See:
Baayen (2001) Word frequency distribution, p.25.
Jarvis (2002) Short texts, best fitting curves and new measures of lexical diversity, p.59 (second formula).
Tweedie & Baayen (1998) How Variable May a Constant be? Measures of Lexical Richness in Perspective, p.330, but here the formula is rather strange with the use of i/N
This is what koRpus does, but, according to my calculations, this is not the case for quanteda.
Quanteda
doc TTR C R CTTR U S K I D Vm Maas lgV0 lgeV0
1 k1.txt 0.6 0.7781513 1.897367 1.341641 4.507576 -Inf 2000 2.571429 0.1111111 0.1825742 0.4710082 1.238943 2.852771
Quanteda
doc TTR C R CTTR U S K I D Vm Maas lgV0 lgeV0
2 k2.txt 0.5 0.7373505 1.870829 1.322876 4.363716 -1.233987 1836.735 1.689655 0.1208791 0.2020305 0.4787092 1.251051 2.880652
In the formula for calculating Yule's K, N (ntokens) is substracted from the sum in the numerator. See:
This is what koRpus does, but, according to my calculations, this is not the case for quanteda.
Texte k1
a b c d d e e f f f
m Vm 1 3 : 3 1 1 + 2 2 : 2 2 2 + 3 1 : 1 3 3 =
Quanteda doc TTR C R CTTR U S K I D Vm Maas lgV0 lgeV0 1 k1.txt 0.6 0.7781513 1.897367 1.341641 4.507576 -Inf 2000 2.571429 0.1111111 0.1825742 0.4710082 1.238943 2.852771
koRpus
==================
Text k2
a b c d d e e f f f g g g g
m Vm 1 3 : 3 1 1 + 2 2 : 2 2 2 + 3 1 : 1 3 3 + 4 1 : 1 4 4 = (36 - 14) / 14 / 14 10000 = 1122.448979591836735
(36) / 14 / 14 10000 = 1836.73469387755102
Quanteda doc TTR C R CTTR U S K I D Vm Maas lgV0 lgeV0 2 k2.txt 0.5 0.7373505 1.870829 1.322876 4.363716 -1.233987 1836.735 1.689655 0.1208791 0.2020305 0.4787092 1.251051 2.880652
koRpus K@K.ld [1] 1122.449
All the best,
Yves