Open clarkfitzg opened 7 years ago
Both suggestions sound reasonable.
Norm
On Wed, Apr 19, 2017 at 09:46:08AM -0700, Clark Fitzgerald wrote:
Working on updating the documentation and I noticed that we could probably use
fname
as the default fornewbasename
here:filesplitrand(cls,fname,newbasename,ndigs,header=FALSE,sep)
To avoid changing order of arguments we'll also need default for ndigs, perhaps 2?
Opening this now so I will remember to come back to it.
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/matloff/partools/issues/12
Back at it now. Looking at the relevant arguments in "snowdoop" utilities I see:
nch Number of chunks for the file split.
basenm A chunked file name, minus suffix.
infile Name of a nonchunked file.
ndigs Number of digits in the chunked file name suffix.
infilenm Name of input file (without suffix, if distributed).
outdfnm Name of output file (without suffix).
infiledst If TRUE, infilenm is distributed.
usefread If true, use \code fread instead of \code read.table ;
generally much faster; requires \code data.table package.
header TRUE if the file chunks have headers.
seqnums TRUE if the file chunks will have sequence numbers.
sep Field delimiter used in \code read.table .
chunksize Number of lines to read at a time, for efficient I/O.
dname Quoted name of a distributed data frame or matrix. For
\code filesave , the object must have column names.
fname Quoted name of a distributed file.
fnames Character vector of file names.
newbasename Quoted name of the prefix of a distributed file,
e.g. \code xyz for a distributed file \code xyz.01 , \code xyz.02
etc.
inbasename basename of the input files, e.g. x for x.1, x.2, ...
outbasename basename of the output files
nout number of output files
... Additional arguments to \code read.table, write.table
How about condensing infile, infilename, fname, fnames, newbasename, inbasename
into just fname
. The infiledst
argument along with ndigs
can be used to handle the appended numbers. If length(fname) > 1
this can act like fnames
.
Also outbasename, outdfname
could become outfname
for consistency with fname
.
Clark, the cost/benefit ratio seems high here. Cost here means your time. This is exactly the kind of thing you should be avoiding, in my opinion.
It certainly is true that the argument names are rather jumbled, a natural consequence of adding more and more things over time. But anytime something is changed, we have to worry about "ecological" effects, even with Travis.
To me, this is a "back burner" thing.
Norm
On Thu, Jun 08, 2017 at 05:45:57PM -0700, Clark Fitzgerald wrote:
Back at it now. Looking at the relevant arguments in "snowdoop" utilities I see:
nch Number of chunks for the file split. basenm A chunked file name, minus suffix. infile Name of a nonchunked file. ndigs Number of digits in the chunked file name suffix. infilenm Name of input file (without suffix, if distributed). outdfnm Name of output file (without suffix). infiledst If TRUE, infilenm is distributed. usefread If true, use \code fread instead of \code read.table ; generally much faster; requires \code data.table package. header TRUE if the file chunks have headers. seqnums TRUE if the file chunks will have sequence numbers. sep Field delimiter used in \code read.table . chunksize Number of lines to read at a time, for efficient I/O. dname Quoted name of a distributed data frame or matrix. For \code filesave , the object must have column names. fname Quoted name of a distributed file. fnames Character vector of file names. newbasename Quoted name of the prefix of a distributed file, e.g. \code xyz for a distributed file \code xyz.01 , \code xyz.02 etc. inbasename basename of the input files, e.g. x for x.1, x.2, ... outbasename basename of the output files nout number of output files ... Additional arguments to \code read.table, write.table
How about condensing
infile, infilename, fname, fnames, newbasename, inbasename
into justfname
. Theinfiledst
argument along withndigs
can be used to handle the appended numbers. Iflength(fname) > 1
this can act likefnames
.Also
outbasename, outdfname
could becomeoutfname
for consistency withfname
.-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/matloff/partools/issues/12#issuecomment-307265139
Fair enough. I'm going to focus on the file sorting then.
Working on updating the documentation and I noticed that we could probably use
fname
as the default fornewbasename
here:To avoid changing order of arguments we'll also need default for ndigs, perhaps 2?
Opening this now so I will remember to come back to it.