Prior to this pull request, SANA only had rudimentary support for reading compressed files, the only instance of which being when loading Similarity Matrix files. In order to make existing SANA code compatible with compression, I have created two helper functions in utils: FILE* readFileAsFilePointer(const string& fileName, bool& piped) and stdiobuf readFileAsStreamBuffer(const string& fileName)
The first is used for obtaining a C-style FILE* of the passed file name. If the passed in file is uncompressed, it'll return a FILE* generated by fopen. If the file passed in is compressed, it'll return a FILE* generated by popen. It will also change the value of a passed in boolean reference to true if the the returned FILE* is piped (a FILE* generated by popen). Whether or not a FILE* was generated by popen or fopen is important because we need to know if we're going to close it with fclose or pclsoe later. I've created a simple helper function called void closeFile(FILE* fp, const bool& isPiped) that will close the FILE* with the correct function given the file pointer and boolean isPiped.
Example usage:
bool isPiped;
FILE* infile = readFileAsFilePointer("myfile.gz", isPiped); // pass in compressed or uncompressed file
fscanf(infile, ...);
closeFile(infile, isPiped);
The method I described above works great in places like ExternalSimMatrix where C-style file I/O is used; but most of SANA uses C++-style streams. In order to make compression compatible with streams, I had to convert the FILE* generated by popen and fopen into a stream. This required a class called stdiobuf that creates a buffer from a FILE* that is then passed into the constructor of an istream. The standard library only allows the creation of istreams, not ifstreams from buffers. (istream is the base class of ifstream and have almost identical functionality). So from now on, instead of doing ifstream infile("myfile.el"), we do
stdiobuf sbuf = readFileAsStreamBuffer(fileName); // pass in compressed or uncompressed file
istream infile(&sbuf);
string line;
getline(infile, line);
...
Note that close() isn't called the stream; this is because it is all dealt with in the destructor of stdiobuf, so the file is closed whenever the buffer falls out of scope/delete is called on it.
Some other utilities I've made are string getDecompressionProgram(const string& fileName) that will return the decompression program for a given file, and string getUncompressedFileExtension(const string& fileName) which will return el from something like AThaliana.el.gz.
The decompression programs SANA is compatible with are gzip, xzcat, and bzip2. (I added bzip2)
In this pull request, I've only made graph loading and external sim matrixes use the new utilities since it takes a while to test the functionality after swapping to our new utility (ifstream and istream aren't 100% compatible)
Prior to this pull request, SANA only had rudimentary support for reading compressed files, the only instance of which being when loading Similarity Matrix files. In order to make existing SANA code compatible with compression, I have created two helper functions in utils:
FILE* readFileAsFilePointer(const string& fileName, bool& piped)
andstdiobuf readFileAsStreamBuffer(const string& fileName)
The first is used for obtaining a C-style
FILE*
of the passed file name. If the passed in file is uncompressed, it'll return aFILE*
generated byfopen
. If the file passed in is compressed, it'll return aFILE*
generated bypopen
. It will also change the value of a passed in boolean reference to true if the the returnedFILE*
is piped (aFILE*
generated bypopen
). Whether or not aFILE*
was generated bypopen
orfopen
is important because we need to know if we're going to close it withfclose
orpclsoe
later. I've created a simple helper function calledvoid closeFile(FILE* fp, const bool& isPiped)
that will close theFILE*
with the correct function given the file pointer and booleanisPiped
.Example usage:
The method I described above works great in places like ExternalSimMatrix where C-style file I/O is used; but most of SANA uses C++-style streams. In order to make compression compatible with streams, I had to convert the
FILE*
generated bypopen
andfopen
into a stream. This required a class calledstdiobuf
that creates a buffer from aFILE*
that is then passed into the constructor of anistream
. The standard library only allows the creation ofistream
s, notifstreams
from buffers. (istream is the base class of ifstream and have almost identical functionality). So from now on, instead of doingifstream infile("myfile.el")
, we doNote that
close()
isn't called the stream; this is because it is all dealt with in the destructor ofstdiobuf
, so the file is closed whenever the buffer falls out of scope/delete
is called on it.Some other utilities I've made are
string getDecompressionProgram(const string& fileName)
that will return the decompression program for a given file, andstring getUncompressedFileExtension(const string& fileName)
which will returnel
from something likeAThaliana.el.gz
.The decompression programs SANA is compatible with are gzip, xzcat, and bzip2. (I added bzip2)
In this pull request, I've only made graph loading and external sim matrixes use the new utilities since it takes a while to test the functionality after swapping to our new utility (
ifstream
andistream
aren't 100% compatible)