ufechner7 / KiteModels.jl

Kite and tether models for the simulation of kite power systems
https://ufechner7.github.io/KiteModels.jl/
MIT License
8 stars 2 forks source link

[IDAS ERROR] IDACalcIC Newton/Linesearch algorithm failed to converge. #44

Closed 1-Bart-1 closed 2 months ago

1-Bart-1 commented 5 months ago

When running the reset() and get_next_step() functions I very rarely get the error: [IDAS ERROR] IDACalcIC Newton/Linesearch algorithm failed to converge.

This is happening while Reinforcement Learning. So I run the reset once, and then run get_next_step until the kite crashes (around 285 times). That is one episode, and this is repeated x times. I get the error for the first time after around 76 episodes (each running reset once, and get_next_step 285 times on average).

One episode pseudocode:

reset()
while not crashed:
    get_next_step(action)

Output.txt containing the rollout of one episode and the first occurrance error:

---------------------------------
| rollout/           |          |
|    ep_len_mean     | 285      |
|    ep_rew_mean     | -200     |
| time/              |          |
|    episodes        | 76       |
|    fps             | 456      |
|    time_elapsed    | 84       |
|    total_timesteps | 38448    |
| train/             |          |
|    actor_loss      | 4.3e+11  |
|    critic_loss     | 3.32e+23 |
|    ent_coef        | 2.06e+10 |
|    ent_coef_loss   | -437     |
|    learning_rate   | 0.0456   |
|    n_updates       | 768      |
---------------------------------

[IDAS ERROR]  IDACalcIC
  Newton/Linesearch algorithm failed to converge.

Environment.jl:

module Environment

using Timers; tic()
using KiteModels
using KiteUtils
# using PyCall #removed pycall!!
# using Plots

const Model = KPS4

set_data_path(joinpath(@__DIR__, "../../Simulator/data"))
kcu = KCU(se());
kps4 = Model(kcu);
dt = 1/se().sample_freq
steps = 1000
step = 0
logger = Logger(se().segments + 5, steps) 

GC.gc();
toc();

integrator = KiteModels.init_sim!(kps4, stiffness_factor=0.04);

function get_next_step(depower, steering)
    global step
    depower = Float32(depower)
    steering = Float32(steering)

    v_ro = 0.0

    if depower < 0.22; depower = 0.22; end
    set_depower_steering(kps4.kcu, depower, steering)

    t_sim = 0.0
    open("next_step_io.txt", "w") do io
        redirect_stdout(io) do
            t_sim = @elapsed KiteModels.next_step!(kps4, integrator, v_ro=v_ro, dt=dt)
        end
    end

    GC.gc(false)

    sys_state = SysState(kps4)
    step += 1

    return sys_state.orient[1], sys_state.orient[2], sys_state.orient[3], sys_state.orient[4], sys_state.force
end

function reset()
    global kcu
    global kps4
    global integrator
    global step
    global sys_state
    update_settings()
    save_log(logger)
    kcu = KCU(se());
    kps4 = Model(kcu);
    integrator = KiteModels.init_sim!(kps4, stiffness_factor=0.04)
    step = 0
    sys_state = SysState(kps4)
    GC.gc();
    return sys_state.orient[1], sys_state.orient[2], sys_state.orient[3], sys_state.orient[4], sys_state.force
end

function render()
    global sys_state, logger, step, steps
    if(step < steps)
        log!(logger, SysState(kps4))
    end
end

end

System: I am running the code on IDUN High Performance Computing: https://www.hpc.ntnu.no/idun/ inside an apptainer ubuntu container. I made a system image with Environment as a precompiled package.

ufechner7 commented 3 months ago

There are many possible reasons why the solver can fail.

The first thing I would try is to change the solver settings:

solver:
    abs_tol: 0.0006        # absolute tolerance of the DAE solver [m, m/s]
    rel_tol: 0.001         # relative tolerance of the DAE solver [-]
    linear_solver: "GMRES" # can be GMRES or Dense
    max_order: 4           # maximal order, usually between 3 and 5
    max_iter:  200         # max number of iterations of the steady-state-solver

This can be changed globally in settings.yaml, but also in a case-by-case way e.g. by doing:

se().abs_tol=0.000006
se().rel_tol=0.0000001

The second thing to try is to reduce the stiffness of the tether

tether:
    c_spring

At the beginning of a simulation I always use a low stiffness and increase it to the nominal value when an equilibrium is reached.

Does this answers your question?

1-Bart-1 commented 3 months ago

Yes, thank you!

On Tue, Mar 5, 2024 at 3:27 PM Uwe Fechner @.***> wrote:

There are many possible reasons why the solver can fail.

The first thing I would try is to change the solver settings: `` solver: abs_tol: 0.0006 # absolute tolerance of the DAE solver [m, m/s] rel_tol: 0.001 # relative tolerance of the DAE solver [-] linear_solver: "GMRES" # can be GMRES or Dense max_order: 4 # maximal order, usually between 3 and 5 max_iter: 200 # max number of iterations of the steady-state-solver

This can be changed globally in settings.yaml, but also in a case-by-case way e.g. by doing:

se().abs_tol=0.000006 se().rel_tol=0.0000001

The second thing to try is to reduce the stiffness of the tether

tether: c_spring

At the beginning of a simulation I always use a low stiffness and increase it to the nominal value when an equilibrium is reached.

Does this answers your question?

— Reply to this email directly, view it on GitHub https://github.com/ufechner7/KiteModels.jl/issues/44#issuecomment-1978897649, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIBR55GL3ZTQJXDJAUXW723YWXI53AVCNFSM6AAAAABCI3ONSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZYHA4TONRUHE . You are receiving this because you authored the thread.Message ID: @.***>

ufechner7 commented 2 months ago

I just added the option to use the DFBDF solver, which - in general - works much better, much more stable, in average 4 times faster and half the memory usage. Please try it out and tell me if this fixes your problem.

1-Bart-1 commented 2 months ago

Thanks, it fixed the problem!

On Wed, Apr 3, 2024 at 4:34 PM Uwe Fechner @.***> wrote:

I just added the option to use the DFBDF solver, which - in general - works much better, much more stable, in average 4 times faster and half the memory usage. Please try it out and tell me if this fixes your problem.

— Reply to this email directly, view it on GitHub https://github.com/ufechner7/KiteModels.jl/issues/44#issuecomment-2034801002, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIBR55HH6OB3YJ6AJQ43QB3Y3QHRFAVCNFSM6AAAAABCI3ONSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZUHAYDCMBQGI . You are receiving this because you authored the thread.Message ID: @.***>