I have added a branch called "corpus_crawl" that permits to crawl the entire twitter corpus by running the script only one time.
It uses the "-data" parameter as root directory of the corpus, and the "-output" parameter as the output root folder.
The script replicates, in the output folder, the same directory structure of the original corpus.
I have added a branch called "corpus_crawl" that permits to crawl the entire twitter corpus by running the script only one time. It uses the "-data" parameter as root directory of the corpus, and the "-output" parameter as the output root folder. The script replicates, in the output folder, the same directory structure of the original corpus.