How does your second question relate to the first question? It seems like they are two separate topics. I would suggest that you focus on one of those questions.
Good use of natural language tools on a cool Google BIgQuery database.
Is random forest significantly better than linear regression?