rhiever / reddit-analysis

A Python script that parses post titles, self-texts, and comments on reddit and makes word clouds out of the word frequencies.
286 stars 63 forks source link

Data dumps checked into version control. #35

Closed cmcdowell closed 11 years ago

cmcdowell commented 11 years ago

I'm wondering if checking the contents of the data_dumps directory into version control is sensible. It's not really source code, maybe it would be better to host the files somewhere like pastebin.

bboe commented 11 years ago

For the most part I agree, though they can be used as testing corpora.

cmcdowell commented 11 years ago

Maybe it would be best to leave one or two in as examples and for testing purposes then?

rhiever commented 11 years ago

Well, we're not checking in any more data dumps AFAIK (*.csv is in .gitignore now). Agreed that at least a couple should be kept in as test cases, or we'll have to make our own test case.

Theoretically though, github is a good repo for doing small data file dumps like this. Maybe in its own repo though.

rhiever commented 11 years ago

All data dumps removed.