steveKapturowski / tensorflow-rl

Implementations of deep RL papers and random experimentation
Apache License 2.0
177 stars 47 forks source link

segmentation fault with fast_cts.pyx #16

Open dhfromkorea opened 7 years ago

dhfromkorea commented 7 years ago

Hi Steve, I feel like asking you another newbie question.

I am having a hard time running fast_cts module. I think I successfully compiled the cython module, but when using the module (fast_cts), I get a Segmentation Fault error. I tracked down to see which line causes it in fast_cts.pyx:

def update(self, obs)
    # the line below
    obs = resize(obs, (self.height, self.width), preserve_range=True)
    obs = np.floor((obs*self.num_bins)).astype(np.int32)

    log_prob, log_recoding_prob = self._update(obs)
    return self.exploration_bonus(log_prob, log_recoding_prob)

It's the line marked with "# the line below" and the error occurs precisely when the resized output gets assigned to obs variable. This is my first time dealing with Cython so I am not so sure whether the error has to do with:

a) the Cython compilation (Ubuntu 16.04 LTS) or b) the Cython code itself (since it's segmentation fault, it must have to do with memory)

Let me know if you need more specific details to share any advice.

Plus, could I ask what percentage of speedup you achieved with fast_cts, compared to the original cts? With the original cts, I get 64 timesteps / second...

Thanks.

steveKapturowski commented 7 years ago

@dhfromkorea hmm, that's interesting. I've run this code on mac and ubuntu without any issue. The fast_cts code is significantly faster; I'll have to double check on the exact speedup, but I I would recommend against trying to use the pure python version. 'resize' is coming from the scikit-image library though; I wonder if there could be a problem with that installation?

dhfromkorea commented 7 years ago

Hm... okay, so it's probably about compilation of the extension...

resize' is coming from the scikit-image library though -> it worked fine when using the original cts ... so I doubt it's the cause.

I would recommend against trying to use the pure python version. -> hm... how do I know if this is the case? I thought I am importing the compiled version of the extension ...

steveKapturowski commented 7 years ago

Can you let me know the python and library versions that you're using? Also, when you tried the the original cts model can you describe precisely which changes you made to the code?

dhfromkorea commented 7 years ago

Hi, I am using python 3.5 and cython 0.25.2. Here you can check the code: https://github.com/RL-ninja/beating-montezuma/blob/master/algorithms/paac_cts.py

FYI, I have not changed the cython code at all. The error happens when I consume the code as it is.

A correction: the line causing the error was not resize(). It appears having to do with cts_update:

cdef double cts_update(CTSStruct* cts, int[:] context, int symbol):    
    cts[0]._time += 1.0
    cts[0].log_alpha = log(1.0 / (cts[0]._time + 1.0))
    cts[0].log_1_minus_alpha = log(cts[0]._time / (cts[0]._time + 1.0))
    cdef double log_prob = node_update(cts[0]._root, context, symbol)
    return log_prob

Along the call trace, the seg fault happens either in node_update or further down.

cdef double node_update(CTSNodeStruct* node, int[:] context, int symbol):
    lp_estimator = estimator_update(node[0].estimator, symbol)

    # If not a leaf node, recurse, creating nodes as needed.
    cdef CTSNodeStruct* child
    cdef double lp_child
    cdef double lp_node

    if context.shape[0] > 0:
        child = node_get_child(node, context[context.shape[0]-1])
        lp_child = node_update(child, context[:context.shape[0]-1], symbol)
        lp_node = node_mix_prediction(node, lp_estimator, lp_child)

        node_update_switching_weights(node, lp_estimator, lp_child)

        return lp_node
    else:
        node[0]._log_stay_prob = 0.0
        return lp_estimator

I am in the process of setting up gdb for cython so I can trace down the error.

For some reason, the seg fault error consistently occurs at a specific "factor" which is [i=1, j=26] where height and width are 42.

I hope to figure out the bug (most likely my bad.. :-) asap...

dhfromkorea commented 7 years ago

to clarify what I mean by [i=1, j=26], I do:

    cpdef (double, double) _update(self, int[:, :] obs):
       ...
        for i in range(self.height):
            for j in range(self.width):
                ...
                print(i, j)
                log_prob += cts_update(&self.cts_factors[i][j], context, obs[i, j])
                log_recoding_prob += cts_log_prob(&self.cts_factors[i][j], context, obs[i, j])
        return log_prob, log_recoding_prob

And I get consistently: (1, 24) (1, 25) (1, 26) [1] 3918 segmentation fault python train.py -g montezuma_revenge -df logs/montezuma/ -algo paac_cts

Funny if I change the game to Breakout, I get, again, consistently: (1, 8) (1, 9) (1, 10) [1] 3623 segmentation fault python train.py -g breakout -df logs/breakout -algo paac_cts

dhfromkorea commented 7 years ago

Hi, I got it fixed. The problem was the state (game screen image) came as being of a uint8 type... and I just had to convert it to float64.

Thanks for bearing the stupid question. Thanks again for your advice and insight!

UPDATE: with fast_cts, I am getting 272.2787954246189 steps/s avg. (with 4-frame skipping)

This still feels very slow, compared to what I normally get without cts: 2000 steps/s.

Do you have any thought how we can make this even faster? (e.g. running on a GPU)

steveKapturowski commented 7 years ago

@dhfromkorea I could imagine parallelizing the per-pixel CTS model updates, but as there are really no matrix operations going on I'm not sure that running it on a GPU would be a huge help. On my system I found that the Double DQN updates were actually a more significant bottleneck than the density model updates. If you comment out the density model update in my implementation what sort of performance do you get?

Also, sorry for the delay-- I've been travelling and haven't had much time on my computer before ICML