Closed Rio517 closed 8 years ago
Hey @Rio517, I remember you. I'm not that old! Great to see you getting mileage out of the SSS. We'll be pushing updates, and we're always happy to review PRs!
As preparatory work for this I updated couchimport
to version 0.3.0. This adds another function previewURL
which we can use to get the preview of file which you have the URL of, but haven't uploaded to SSS. It loads the first 10k of the file then kills the connection (because the file could be HUGE).
couchimport.previewURL('https://s3-eu-west-1.amazonaws.com/glynnbirddotcom/hp.csv', { COUCH_DELIMITER: ","}, function(err, data) {
console.log(err,data);
});
null [ { id: '{0FC6F1BF-79C4-401E-9910-0000F5CC2B4A}',
price: '195000',
date: '2015-04-16 00:00',
postcode: 'EN8 7EG',
a: 'F',
b: 'N',
c: 'L',
building: 'BUTLERS COURT',
house_number: '5',
road: 'TRINITY LANE',
address1: '',
address2: 'WALTHAM CROSS',
town: 'BROXBOURNE',
county: 'HERTFORDSHIRE',
property_type: 'A' },
{ id: '{CB44E6D8-CD59-4CDD-AD79-0000F773874C}',
price: '60000',
date: '2015-04-09 00:00',
postcode: 'S2 5FW',
a: 'S',
b: 'N',
c: 'F',
building: '1',
house_number: '',
road: 'HASLEHURST ROAD',
address1: '',
address2: 'SHEFFIELD',
town: 'SHEFFIELD',
county: 'SOUTH YORKSHIRE',
property_type: 'A' },
{ id: '{B548CACA-5D17-4B6A-ADF4-0002188D07F0}',
price: '248000',
date: '2015-04-24 00:00',
postcode: 'BR5 3BQ',
a: 'S',
b: 'N',
c: 'F',
building: '2',
house_number: '',
road: 'HORSELL ROAD',
address1: '',
address2: 'ORPINGTON',
town: 'BROMLEY',
county: 'GREATER LONDON',
property_type: 'A' } ]
This is the same preview technology as is used in the existing SSS but that only worked for uploaded files.
Then we can use the pre-existing couchimport.importStream
to do the actual import without downloading the whole file:
e.g
couchimport.importStream(request.get('http://s3-eu-west-1.amazonaws.com/glynnbirddotcom/hp.csv'), {COUCH_URL: 'http://localhost:5984', COUCH_DATABASE: 'mydb', COUCH_DELIMITER: ','}, function(err, data) {
console.log(err, data);
})
This is good to go from our POV. @Rio517 please reopen if you find issues.
For larger datasets, uploading can be a pain. I uploaded a 800MB file for 2-3 hours over a poor connection outside of Berlin. I realize my use case is narrow, but I think these proposed enhancements could help others:
In my case, I wound up setting up a remote ubuntu desktop box that I uploaded a gzipped file (91MB), then uncompressed to upload in the SSS GUI.
Good luck and thanks for everyone's hard work so far!
P.S. I realized that @bradnoble and I used to work together back at Mullen in 2004 or 2005. I was a 23yo account guy back then and would be surprised if you remembered me. Glad to see you're doing well. :)