Closed GoogleCodeExporter closed 9 years ago
i would suggest a simple command line tool that users can download to bulkload
large
files. the tool would essentially divide and conquer a large local csv file by
posting chunks of it to a rebioma web service. the tool would require rebioma
user
credentials, a chunk size, and a URL endpoint. this method is more scalable
since
downloading large files from the cloud will be even more restrictive.
Original comment by eightyst...@gmail.com
on 6 Apr 2009 at 4:53
With the new Darwin Core's DatasetID term we could solve any relationships
between
files and maintain their coherence once loaded, should the user choose to do so.
Your recommendation, however, skirts the issue of provider registration and
places
the burden on the provider, again, to do something special to participate. THAT
is
not scalable. Providers need to be able to say "Here I am, come and get my
data"
without having a special hoop to jump through for every initiative (portal) in
which
they want to participate. This has to be a consideration.
Original comment by gtuco.bt...@gmail.com
on 6 Apr 2009 at 5:17
good point. so providers use the CsvProvider software to generate CSV dumps
which can
then be retrieved using a web service, right? suppose CsvProvider could be
modified
to optionally provide CSV download via pagination.
Original comment by eightyst...@gmail.com
on 6 Apr 2009 at 5:25
Yes. And an additional service could also be built as a middle-man to providers
who
don't have a software installation, but instead just have a file accessible via
a
URL. This middle-man software could grab the whole file and perform the chunked
upload.
Original comment by gtuco.bt...@gmail.com
on 6 Apr 2009 at 5:34
CsvProvider modification helps providers but doesn't help single users. the
middle-man service requires maintaining a server somewhere outside of the cloud.
backing up a bit, i still think the best solution is a command line bulkloading
tool.
it helps providers who can just automate it using a cron job, it help users who
can
run it from their machine, and it avoids maintaining servers outside the cloud.
Original comment by eightyst...@gmail.com
on 6 Apr 2009 at 5:42
this needs additional input from Aaron and John
Original comment by tom.alln...@gmail.com
on 17 Feb 2011 at 1:24
Original comment by tom.alln...@gmail.com
on 17 Feb 2011 at 1:25
some recent feedback from Aaron:
uploading large files to the Rebioma server (say larger than
50MB) over HTTP is going to be problematic. I still think the right
solution is a command line bulk loading tool. Another approach is to
support FTP uploads to the server with a task queue that can process
files and email people when it's done.
An interim solution (pending time and funding to develop a more complete fix)
is to add some text (to the upload process) warning users not to upload files >
50 MB, or to break these up to avoid issues. Users could also be notified that
they may directly with us (contact: rebiomawebportal [at] gmail [dot] com) to
get large files onto the system.
Original comment by tom.alln...@gmail.com
on 8 Mar 2011 at 8:14
A recent observation: We are able to upload about 20k records at a time, with
about 10 fields. A command line or ftp tool would still be a useful add-on
Original comment by tom.alln...@gmail.com
on 26 Jul 2011 at 4:48
an even simpler solution to this would be a command line tool that runs on the
server. Project administrators could upload files via ftp, then run the command
line to ingest the data. Paired with issue #406, ownership of these records
could be changed, or even assigned to any user with this same tool.
Original comment by tom.alln...@gmail.com
on 8 Aug 2011 at 11:36
Original comment by ajmrakot...@gmail.com
on 20 Oct 2011 at 7:53
Wilfried is working on this, almost implemented for files < 25 MB. For larger
files, this is an issue with JVM, and users will have to break files up if
larger than 25 MB.
Original comment by tom.alln...@gmail.com
on 29 Oct 2012 at 9:54
Wilfried fixed this issue!
Original comment by nirina.t...@gmail.com
on 23 Jan 2013 at 12:35
Original issue reported on code.google.com by
gtuco.bt...@gmail.com
on 6 Apr 2009 at 4:06