Saving weights.npz error!

sadiknina commented 6 years ago

This error is shown while trying to save the trained model using Ctrl+C

NA68ECS 0.0 <-> SS80RJD 1.0 VH96FLC 1.0 <-> SS80RJD 1.0 ND78RRR 1.0 <-> SS80RJD 1.0 JO64KPG 0.0 <-> SS80RJD 1.0 IP11HRV 0.0 <-> SS80RJD 1.0 GS66WQA 0.0 <-> SS80RJD 1.0 WW22MFV 0.0 <-> SS80RJD 1.0 AJ69HHA 1.0 <-> SS80RJD 1.0 LO05ACN 0.0 <-> SS80RJD 1.0 RN46IKZ 1.0 <-> SS80RJD 1.0 PD67JOS 1.0 <-> SS80RJD 1.0 CC50LIG 1.0 <-> SS80RJD 1.0 SH22NYA 1.0 <-> SS80RJD 1.0 EJ63ANL 1.0 <-> SS80RJD 1.0 CQ23FZB 0.0 <-> SS80RJD 1.0 VB46CTT 0.0 <-> SS80RJD 1.0 UW07QYK 1.0 <-> SS80RJD 1.0 UY55WEE 1.0 <-> SS80RJD 1.0 YY76RDB 0.0 <-> SS80RJD 1.0 HE67HQI 0.0 <-> SS80RJD 1.0 ZM60MGH 0.0 <-> SS80RJD 1.0 TI57KNR 0.0 <-> SS80RJD 1.0 JM74WYE 0.0 <-> SS80RJD 1.0 XQ06DTI 1.0 <-> SS80RJD 1.0 HZ92TYI 1.0 <-> SS80RJD 1.0 PQ07UOA 1.0 <-> SS80RJD 1.0 YF28WPW 0.0 <-> SS80RJD 1.0 YG80GAG 0.0 <-> SS80RJD 1.0 LB83MTT 0.0 <-> SS80RJD 1.0 SA00DVB 1.0 <-> SS80RJD 1.0 WB27CRE 1.0 <-> SS80RJD 1.0 SP44VKA 0.0 <-> SS80RJD 1.0 NP76UVV 0.0 <-> SS80RJD 1.0 DY99FUI 1.0 <-> SS80RJD 1.0 TM74IZO 1.0 <-> SS80RJD 1.0 FH87OOE 0.0 <-> SS80RJD 1.0 HY33LBK 0.0 <-> SS80RJD 1.0 QB77WNK 1.0 <-> SS80RJD 1.0 IV34HCY 0.0 <-> SS80RJD 1.0 ZK77JAL 0.0 <-> SS80RJD 1.0 QQ57ATV 0.0 <-> SS80RJD 1.0 TB67JJQ 0.0 <-> SS80RJD 1.0 PI62NYX 0.0 <-> SS80RJD 1.0 SI33DHX 1.0 <-> SS80RJD 1.0 AQ50HNO 1.0 <-> SS80RJD 1.0 NQ57FJT 1.0 <-> SS80RJD 1.0 NH13WAC 0.0 <-> SS80RJD 1.0 UA09ZPI 0.0 <-> SS80RJD 1.0 TP98FJQ 0.0 <-> SS80RJD 1.0 IO89BAM 1.0 <-> SS80RJD 1.0 B 40 0.00% 44.00% loss: -23851382784.0 (digits: 1750573696.0, presence: -25601955840.0) |XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX| time for 60 batches 141.08570551872253 forrtl: error (200): program aborting due to control-C event Image PC Routine Line Source libifcoremd.dll 00007FFD987A94C4 Unknown Unknown Unknown KERNELBASE.dll 00007FFDECA97EDD Unknown Unknown Unknown KERNEL32.DLL 00007FFDEEFF1FE4 Unknown Unknown Unknown ntdll.dll 00007FFDEFBBEFB1 Unknown Unknown Unknown

Any one else had this issue and solved it? Please help

5059 commented 6 years ago

The same error, Anyone solved? （Windows 10 System）

yurinativo commented 6 years ago

Same error...

Windows 8.1 64 bis
Python 3.5
TensorFlow 1.7.0 GPU
Numpy 1.14.2
Cuda compilation tools, release 9.0, V9.0.176 CUDNN 7.0.5

Looks like the CTRL+C its aborting not only the training process, but the main thread too.

yurinativo commented 6 years ago

I solved with an alternative. Instead of using try:except: waiting the process to be aborted by ctrl+c, I changed the code to listen the keyboard.

add the code at the begining of the train.py: import keyboard

at the end of the function train(), change the try: except: to this:

    `try:
        last_batch_idx = 0
        last_batch_time = time.time()
        batch_iter = enumerate(read_batches(batch_size))
        for batch_idx, (batch_xs, batch_ys) in batch_iter:
            print('batch_idx {}'.format( batch_idx))

            #begining of the change
            if keyboard.is_pressed('q')
                print('salvando em weights.npz')
                last_weights = [p.eval() for p in params]
                numpy.savez("weights.npz", *last_weights)
                return last_weights                    
            #end of the change

            do_batch(last_batch_time)
            if batch_idx % report_steps == 0:
                batch_time = time.time()
                if last_batch_idx != batch_idx:
                    #print("time for 60 batches {}".format(60 * (last_batch_time - batch_time) / (last_batch_idx - batch_idx)))
                    last_batch_idx = batch_idx
                    last_batch_time = batch_time

    except KeyboardInterrupt:
        print('salvando em weights.npz')
        last_weights = [p.eval() for p in params]
        numpy.savez("weights.npz", *last_weights)
        return last_weights`

Now you just need to press Q to stop (I have to hold the Q button to work)

yurinativo commented 6 years ago

Other alternative is to save the weights each step. Im using the report_step = 50, so each 50 step, the loop save the weight, in order to prevent to lose information after hours running:

    `try:
        last_batch_idx = 0
        last_batch_time = time.time()
        batch_iter = enumerate(read_batches(batch_size))
        for batch_idx, (batch_xs, batch_ys) in batch_iter:
            print('batch_idx {}'.format( batch_idx))
            if keyboard.is_pressed('q'):#if key 'q' is pressed
                print('salvando em weights.npz')
                last_weights = [p.eval() for p in params]
                numpy.savez("weights.npz", *last_weights)
                return last_weights                    
            do_batch(last_batch_time)
            if batch_idx % report_steps == 0:
                batch_time = time.time()
                if last_batch_idx != batch_idx:
                    #print("time for 60 batches {}".format(60 * (last_batch_time - batch_time) / (last_batch_idx - batch_idx)))
                    last_batch_idx = batch_idx
                    last_batch_time = batch_time
                    #begining of the change
                    print('salvando em weights.npz')
                    last_weights = [p.eval() for p in params]
                    numpy.savez("weights.npz", *last_weights)                
                    #end of the change

    except KeyboardInterrupt:
        print('salvando em weights.npz')
        last_weights = [p.eval() for p in params]
        numpy.savez("weights.npz", *last_weights)
        return last_weights`

5059 commented 6 years ago

Good job, it's really helpful@yurinativo

Pratipkhandelwal commented 5 years ago

It gives this error : TypeError: do_batch() takes 0 positional arguments but 1 was given

Any help to resolve it ?@5059

matthewearl / deep-anpr

Saving weights.npz error! #97