Open GoogleCodeExporter opened 8 years ago
Original comment by yurkin
on 23 Nov 2009 at 2:53
Original comment by yurkin
on 23 Aug 2010 at 4:57
Original comment by yurkin
on 22 Apr 2011 at 2:40
Original comment by yurkin
on 4 Feb 2013 at 5:36
issue 183 is a generalization of this issue.
Original comment by yurkin
on 23 Nov 2013 at 8:21
I did a little test and it seems that using the text format doesn't actually
reduce the size of the input files that much. I experimented with a dipole file
of about 408000 dipoles with positions ranging from -131 to 158. The file sizes
were as follows:
Original file: 4.2 MB
Stored in binary as 16-bit ints: 2.4 MB
Original file gzipped: 0.92 MB
Binary file gzipped: 0.95 MB
So although the text file is about twice the size of the binary, it compresses
much better. Therefore one option for reducing file sizes would be to implement
reading of the current file formats in gzipped form. An even simpler
alternative for reducing input file size would be to implement reading the
dipole file from standard input, so one could pipe dipole data to ADDA through
gunzip.
Original comment by jsleino...@gmail.com
on 4 Dec 2013 at 2:21
Masking stdout of gunzip as shape (pseudo)file can already be done on Unix like
./adda ... -shape read <( gunzip ... )
However, it fails because ADDA scans shape files two times. On the second scan
the piped pseudofile happens to be empty. Such reading procedure is done for
robustness and, in some sense, performance. There is also some random access to
automatically determine the format of the file. So the only way I see to read
stdin (or piped stream) is to buffer the stream in ADDA. But that seems to
remove most of the benefits.
Second issue is that the problem is mostly relevant for large MPI runs, when
shape files can be tens of GB, but each process takes only small part of it
(still it can't buffer the whole file). For smaller shape files using
temporarily file (instead of fifo) seems a fine solution.
Finally, there is another idea described in issue 31. It is probably not that
efficient for very sparse particles (still worth trying), but can lead to
several orders of magnitude compression for large homogeneous and relatively
compact particles (which have the largest size among computationally feasible
runs).
Original comment by yurkin
on 4 Dec 2013 at 3:42
Original issue reported on code.google.com by
fabio.de...@gmail.com
on 23 Nov 2009 at 2:01