Open rahulsmehta opened 6 years ago
Hi! Just popping in to say that (although I don't know the answer to your question) if no such summary yet exists, it would be a wonderful contribution to the research community, and could probably be achieved using NLTK or similar, and could be done on the cloud to support the dataset size. I'd be happy to volunteer some time on it as well, if it's something that hasn't been done.
I'm also looking for the token count information for GloVe.
Anyone know what is the exact source of the pre-processed text? (Esp. for the version of the common crawl) Only with the same pre-processing text can we get the same tokens and their respective corpus counts.
Hi,
After checking on the GloVe home page (https://nlp.stanford.edu/projects/glove/) and specifically looking at the corpus (Wikipedia 2014 + Gigaword 5) for the 6B version, I was wondering if there was a table/summary of the number of times each token occurs in the corpus.
Thanks!