Checkpoint/Restart framework

smcantab commented 10 years ago

I gave this some more thoughts and it will be a pretty serious task, with some issues. Let's start with the one big issue I know of:

there is a bug in the stl rng for gcc4.6 (fixed in gcc4.7) : the state of the rng is not saved or loaded correctly (from stream) so that you can't just continue a calculation on the same random number sequence (hence any test, even if we reload the rng will yield different results). This issue is true at least if the state is read from a stream (I am not sure about other serialization methods, see boost serialize for example). Currently all our computers use gcc4.6.

About the implementation:

each module will have to have a save and load function and MC will have to have a save and load function in turn that saves its own state and calls the respective functions on the modules. I can't see a way around this because some of the modules have their own rng and a set of members that must be reinstated. Then it actually remains the problem of I/O for all this information, it's not obvious to me how to serialize it out of c++, considering that there is also the python layer that needs to be taken care of. Online I found a lot of references to boost serialize to do this sort of things.

Boost serialize would introduce an additional dependency (it could be made optional because check-pointing is a non essential feature). This still requires quite a bit of writing at the c++ level and remains the question of how to make this compatible with the python layer, although it doesn't seem impossible to do (one would have to rebuild the object from python and then call the MC::load function that will make sure everything goes back to the old state).

If we can do this, then re-instating the python layer state is simple because it's just a matter of reloading self.dict before calling mc.load. Unless there will be unforeseen complications.

Oh let's not forget that the potential and optimizer class should be serialized too for proper checkpointing, so the Pele potentials and minimizers would have to undergo this revision too, at least in principle.

This is the only way I can see how to implement a proper check-pointing system without something like BLCR system.

Since this sounds like a ton of work to me I would like to get as many suggestions as possible and get everyone to agree on the best course of action.

js850 commented 10 years ago

I really think you're asking for trouble trying to do it that way. If you pickle a class. Then change the class, then try to reload it you're gonna have a bad time. I would explicitly save all the data that needs to be saved.

checkpoint = dict()
checkpoint["niter"] = niter
checkpoint["ncalls"] = ncalls
checkpoint["histogram"] = <numpy array holding the histogram>

etc then pickle the dictionary. Don't save anything more complicated than a numpy array.

smcantab commented 10 years ago

that's how I thought of it in the first place but then I got stuck on the modules, as each module can have any amount of arbitrary information which you don't want to manage from the MC class.

To do it from python (and manually), the only way around that I see is this: each module pickles itself in a separate pickle file, then when MC.load() is called this tries to reset the states of the modules from their respective pickle files. For example there will be mc.cp.pickle, takestep.cp.pickle and so on to load from.

This is the only way I see to maintain the modularity, since MC does not necessarily know everything the modules know.

js850 commented 10 years ago

Yes, that could work. Maybe something like

def save_state(self):
    """return a picklable object"""

@static_method
def restore_state(state):
    """return a new class constructed from state"""

smcantab commented 10 years ago

I started writing my own basic sqlite3 wrapper for c++ but then realised that there is a large number of c++ wrappers out there, see here. Do you think it might be worth continuing writing our own basic version of an sqlite wrapper or should we just adopt an existing and more exhaustive wrapper?

js850 commented 10 years ago

It would be better to find an existing package. Try to find one that is

1) compatible licence (BSD is better than GPL) 2) small and simple.

smcantab commented 10 years ago

this question, however, suggests that there is really no need to write an exhaustive wrapper as one can use sqlite directly from c++. So maybe I should continue writing the the most simple functions myself and then use sqlite directly from a SqliteDB.execute( command ) function (where SqliteDB is my own class)?

js850 commented 10 years ago

OK, give it a try and see how far you get. You may run into trouble when you try to put a std::vector into the database.

kjs73 commented 10 years ago

I would agree that if in doubt and if we need a wrapper we should not write it ourselves.

smcantab commented 10 years ago

So I have been working on writing an insert vector function and since sql does not understand array one has to turn the vector into a stream and then back. Picking up things around the web I came up with this serialization and deserialization of vectors, please let me know whether you think it can be improved (I am not an expert with stream and string manipulation in c++):

template<typename T>
std::string serialize_vector(std::vector<T> vec)
{
    std::ostringstream oss;
    //store the size of the vector first
    oss << vec.size();
    if (!vec.empty()){
        std::copy(vec.begin(), vec.end()-1, std::ostream_iterator<T>(oss, "\n"));
    }
    return oss.str();
}

template<typename T>
std::vector<T> deserialize_vector(std::string ser_vec)
{
    std::stringstream ss;
    size_t size;
    std::vector<T> vec;

    //turn string into stream
    ss.str(ser_vec);
    //read vector size
    ss>>size;
    ser_vec.resize(size);

    for (size_t i=0; i<size; i++){
        ss >> vec[i];
    }

    return vec;
}

kjs73 commented 10 years ago

Probably in the first function the vector could be passed by (const) reference.

smcantab commented 10 years ago

sorry I had to re-edit a couple of times

js850 commented 10 years ago

If I read correctly you are turning the vector into a string of numbers, e.g. "1.245 1.434 5.3949".

You will run into problems doing it this way. First, be careful about precision, make sure you are printing at extremely high precision.

It would be better to serialize it in binary. I don't know how to do this, but presumably you can copy the whole block of memory vector.data() and cast it as a char array. you'll have to store the length somhow (in a header?).

smcantab commented 10 years ago

ok thanks, I'm going to have to read more on searialization then, for example look at how boost does it or how this guy does it

js850 commented 10 years ago

I imagine you could do something like this

serialize a vector

size_t intsize = sizeof(size_t)
datasize = sizeof(double) * v.size()
size_t block_size = datasize + intsize
void * block = malloc(block_size)
# put the size of the matrix in the front
memcpy(block, &v.size(), intsize)
# you can maybe do the above as 
# *(size_t *) block = v.size()
# put the vector data after the size
memcpy(block + intsize, v.data(), block_size)

You'll have to be careful because sizeof(size_t) is not constant across platforms. So you should probably store it as something that has constant size (e.g. int32_t) see http://en.cppreference.com/w/cpp/types/integer

deserialize a vector

size_t vector_size = *(int_type *) block;
double * vdata = block + sizeof(int_type)
# construct vector with input iterators
vector<double> v(vdata, vdata + sizeof(double) * vector_size)

js850 commented 10 years ago

Actually, if sqlite tells you the size of the char * array (or void * array) you can use that to determine the size of the vector

vector_size = block_size / sizeof(double)

or maybe

vector_size = block_size * sizeof(char) / sizeof(double)

Then you don't need to bother with storing the size at all. That will simplify things

js850 commented 10 years ago

this answer has a cleaner way of doing it using streams and reinterpret_cast http://stackoverflow.com/questions/14089266/how-to-correctly-write-vector-to-binary-file-in-c

kjs73 commented 10 years ago

Once I used

std::ofstream stm( file_name.c_str(), std::ios::binary );

for (typename std::vector::const_iterator i = vec.begin(); i != vec.end(); ++i) {

stm.write((char_)(&_i), sizeof(F));

}

Probably you are looking for more advanced stuff.

On 9 September 2014 19:48, Jacob Stevenson notifications@github.com wrote:

this answer has a cleaner way of doing it using streams and reinterpret_cast

http://stackoverflow.com/questions/14089266/how-to-correctly-write-vector-to-binary-file-in-c

— Reply to this email directly or view it on GitHub https://github.com/pele-python/mcpele/issues/32#issuecomment-55015178.

kjs73 commented 10 years ago

Ah, sorry for the repetition. Yes, I also think that the StackOverflow answer looks good.

smcantab commented 10 years ago

ok guys great! I'll give it a go. Let's bear in mind that these vectors must be readable from python too, so this means that the serialize and deserialize functions will have to be wrapped in cython

kjs73 commented 10 years ago

Maybe this would be a chance to try out boost.python?

On 10 September 2014 11:02, Stefano Martiniani notifications@github.com wrote:

ok guys great! I'll give it a go. Let's bear in mind that these vectors must be readable from python too, so this means that the serialize and deserialize functions will have to be wrapped in cython

— Reply to this email directly or view it on GitHub https://github.com/pele-python/mcpele/issues/32#issuecomment-55094921.

smcantab commented 10 years ago

that's what I thought too

smcantab commented 10 years ago

so this is what I came up with, based on the stack overflow suggestion:

/* Serialize and deserialize functions. These only work for vectors of Plain Old Data structures.
 * Furthermore these only work if the vectors contain only a single type and no pointers or
 * references.
 */
template<typename T>
std::string serialize_vector(const std::vector<T>& vec)
{
    static_assert(!std::is_pointer<T>::value || !std::is_reference<T>::value,"type is pointer or reference");
    std::ostringstream strm;
    strm.write(reinterpret_cast<const char*>(&vec[0]), vec.size()*sizeof(T));
    return strm.str();
}

template<typename T>
std::vector<T> deserialize_vector(const std::string& ser_vec)
{
    static_assert(!std::is_pointer<T>::value || !std::is_reference<T>::value,"type is pointer or reference");
    std::stringstream strm(ser_vec);
    const size_t length = ser_vec.size() / sizeof(T);
    std::vector<T> vec(length);
    strm.read(reinterpret_cast<char*>(&vec[0]), length*sizeof(T));
    return vec;
}

js850 commented 10 years ago

I think that will work just fine. I assume you are working with strings because you will save it in the SQL database as a string type? It would be cleaner to store it as a BLOB type (or something similar that is just a chunk of memory).

smcantab commented 10 years ago

so you think I should return and take a pointer to void in serialize and deserialize respectively? Then static_cast the pointer to a string and pass it to a stream? For instance for deserialize:

template<typename T>
std::vector<T> deserialize_vector(const void* mem_block)
{
    static_assert(!std::is_pointer<T>::value || !std::is_reference<T>::value,"type is pointer or reference");
    const size_t size = *(int_type *) mem_block;
    void * vec_block = mem_block + sizeof(int_type)
    const size_t length =  size / sizeof(T);
    std::vector<T> vec(length);
    std::string casted_memory(static_cast<char*>(vec_block), size);
    std::istringstream strm(casted_memory);
    strm.read(reinterpret_cast<char*>(&vec[0]), length*sizeof(T));
    return vec;
}

where I have used your earlier suggestion to figure out the size of a vector

js850 commented 10 years ago

The documentation will tell you how SQL accepts and returns blob data types. But it almost certainly will return the size of the blob. So it should be simpler

smcantab commented 10 years ago

ok, it's quite a bit of code but at least it's already written: Sqlite blob example

smcantab commented 10 years ago

How about we use this strategy instead? It's a combination of boost::serialize and blob storing in the sqlite database. Overall I think is cleaner than our current approach, it's more concise and probably the best long term strategy.

js850 commented 10 years ago

I just don't think you need boost::serialize. Vectors are already serialized, how would boost::serialize help?

smcantab commented 10 years ago

It turns the class into one big blob and it deserialises it. Right now we need to use different template functions for each type of datum and we need multiple sqlite directives for each of these types. With one big blob we need to know how to do only one thing.

Said that, this might not be so useful when it comes to reading from the database using python, because then we can't access directly the data. So from that perspective it complicates things. On 12 Sep 2014 09:57, "Jacob Stevenson" notifications@github.com wrote:

I just don't think you need boost::serialize. Vectors are already serialized, how would boost::serialize help?

— Reply to this email directly or view it on GitHub https://github.com/pele-python/mcpele/issues/32#issuecomment-55377443.

js850 commented 10 years ago

because then we can't access directly the data

exatly. And we don't want to make it easy to serialize any class because that's very dangerous, especially for a rapidly changing probject. If you change the definition of a class you then won't be able to extract from the database.

I can't see any need to store anything other than

basic types (integers, doubles)
strings
memory contiguous arrays of basic types. (vectors)

All of these are trivially serialized. Let's not provide functionality that we don't want people to use.

pele-python / mcpele

Checkpoint/Restart framework #32