After a brief look at the files, it seems that each file consists of only about 2500 comments, that have been multiplied 50 times. I do not see this mentioned anywhere and in fact it's a huge dealbreaker if you try to use this dump for anything semi-serious.
NoRepeats.zip
I wrote a Bash script to remove all the repeat comments. Script can be found inside the zip file, in addition to another script to remove all blank comments.
After a brief look at the files, it seems that each file consists of only about 2500 comments, that have been multiplied 50 times. I do not see this mentioned anywhere and in fact it's a huge dealbreaker if you try to use this dump for anything semi-serious.