Closed aebrow4 closed 7 years ago
re removing dupes—can be done in bash pretty quickly.
sort users-file.sql | uniq
A little trickier in JS.
dedup(usersCSV.split('\n')))
function dedup (listOfStrings) { return Object.keys(listOfStrings.reduce((uniques, string) => Object.assign(uniques, {[string]: true}), {})) }
re delimiting, also escape newlines?
Hmm. I'm kind of wondering why even go to the trouble of writing this out to csv when we can just batch insert into pg. Are we going to use these files somewhere else? If not just inserting them directly into pg seems to make more sense to retain their relationships.
@dting can you explain more on if batch insert would work with json? And into a document store or relational? (I am operating on the assumption we are going relational based on a couple of articles) Googled around a bit but didn't find anything that helpful re: batch insert.
Holding on merging this while we figure out if we can use another solution to handle importing more easily
Any reason not to merge this and if that other solution pans out submit a new PR replacing this with that? Iterative! Agile!
Intended for generating csvs that can be easily read into postgres. After looking at the slack export, it looked like there are three logical pieces of data that lend themselves to SQL tables. Thus the script ouputs one csv file per table.
TODO: 1) Remove duplicates from the user before writing to file 2) Escape commas in messages and attachments since they will screw up the comma delimiting of the csv. Or delimit with tab or something