hrx / slack-private-archiver

Store and search Slack messages in private data store
MIT License
1 stars 2 forks source link

File reader script #6

Closed aebrow4 closed 7 years ago

aebrow4 commented 8 years ago

Intended for generating csvs that can be easily read into postgres. After looking at the slack export, it looked like there are three logical pieces of data that lend themselves to SQL tables. Thus the script ouputs one csv file per table.

TODO: 1) Remove duplicates from the user before writing to file 2) Escape commas in messages and attachments since they will screw up the comma delimiting of the csv. Or delimit with tab or something

mLuby commented 8 years ago

re removing dupes—can be done in bash pretty quickly.

sort users-file.sql | uniq

A little trickier in JS.

dedup(usersCSV.split('\n')))
function dedup (listOfStrings) { return Object.keys(listOfStrings.reduce((uniques, string) => Object.assign(uniques, {[string]: true}), {})) }

re delimiting, also escape newlines?

dting commented 8 years ago

Hmm. I'm kind of wondering why even go to the trouble of writing this out to csv when we can just batch insert into pg. Are we going to use these files somewhere else? If not just inserting them directly into pg seems to make more sense to retain their relationships.

aebrow4 commented 8 years ago

@dting can you explain more on if batch insert would work with json? And into a document store or relational? (I am operating on the assumption we are going relational based on a couple of articles) Googled around a bit but didn't find anything that helpful re: batch insert.

aebrow4 commented 8 years ago

Holding on merging this while we figure out if we can use another solution to handle importing more easily

mLuby commented 8 years ago

Any reason not to merge this and if that other solution pans out submit a new PR replacing this with that? Iterative! Agile!