The Google News dataset is approx 3.5gb but we only have 614 green apples, which have 3 synonyms each at 1,842 which gives 2,456 total green apple words, and 1,825 red apples (some of them have multiple words so on average about an additional 1,825) and the description with about 15 words on average for a total of 31,025 red apple words. So we could be loading a dataset that's only about 33,481 words instead of 3 million words.
The Google News dataset is approx 3.5gb but we only have 614 green apples, which have 3 synonyms each at 1,842 which gives 2,456 total green apple words, and 1,825 red apples (some of them have multiple words so on average about an additional 1,825) and the description with about 15 words on average for a total of 31,025 red apple words. So we could be loading a dataset that's only about 33,481 words instead of 3 million words.