radamsa --subset and -m <n>mb could be extended, and possibly merged, to do
some useful things automatically by having something like
$ radamsa -m 10mb --truncate subset|truncate|auto
meaning pick 1-n files until the memory limit is hit, take all but use weighted
tail truncation (current default), or combine both. a nice extra feature would
be 'heuristic', which uses some technique to probably pick distinct kinds of
samples, for example by using shared substrings of first 4Kb of data to
partition data below nfiles/X, and then pick n files from the buckets randomly.
Original issue reported on code.google.com by aohelin on 20 Apr 2011 at 1:08
Original issue reported on code.google.com by
aohelin
on 20 Apr 2011 at 1:08