soap117 / DeepRule

BSD 3-Clause "New" or "Revised" License
125 stars 42 forks source link

about the hanging training process #20

Open tairen99 opened 2 years ago

tairen99 commented 2 years ago

Hi Junyu,

Thank you for your good work on the chart extraction.

I follow the Readme and properly install the DeepRule, and I want to test the training code. Everything looks good at the beginning:

['cache', 'pie']
loading all datasets... 
using 1 threads
loading from cache file: /home/DeepRule/data/piedata_1008/cache/pie_train2019.pkl
loading annotations into memory...
/home/DeepRule/data/piedata_1008/pie/annotations/instancesPie(1008)_train2019.json
Done (t=1.61s)
creating index...
index created!
loading from cache file: /home/DeepRule/data/piedata_1008/cache/pie_val2019.pkl
loading annotations into memory...
/home/DeepRule/data/piedata_1008/pie/annotations/instancesPie(1008)_val2019.json
Done (t=0.03s)
creating index...
index created!
system config...
{'batch_size': 26,
 'cache_dir': '/home/DeepRule/data/piedata_1008/cache',
 'chunk_sizes': [5, 7, 7, 7],
 'config_dir': './config',
 'data_dir': '/home/DeepRule/data/piedata_1008/',
 'data_rng': <mtrand.RandomState object at 0x7f1b20d20d38>,
 'dataset': 'Pie',
 'decay_rate': 10,
 'display': 5,
 'learning_rate': 0.00025,
 'max_iter': 50000,
 'nnet_rng': <mtrand.RandomState object at 0x7f1b20d20d80>,
 'opt_algo': 'adam',
 'prefetch_size': 5,
 'pretrain': None,
 'result_dir': './results',
 'sampling_function': 'kp_detection',
 'snapshot': 5000,
 'snapshot_name': 'CornerNetPurePie',
 'stepsize': 45000,
 'tar_data_dir': 'cls',
 'test_split': 'testchart',
 'train_split': 'trainchart',
 'val_iter': 100,
 'val_split': 'valchart',
 'weight_decay': False,
 'weight_decay_rate': 1e-05,
 'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
 'border': 128,
 'categories': 1,
 'data_aug': True,
 'gaussian_bump': True,
 'gaussian_iou': 0.3,
 'gaussian_radius': -1,
 'input_size': [511, 511],
 'lighting': True,
 'max_per_image': 100,
 'merge_bbox': False,
 'nms_algorithm': 'exp_soft_nms',
 'nms_kernel': 3,
 'nms_threshold': 0.5,
 'output_sizes': [[128, 128]],
 'rand_color': True,
 'rand_crop': True,
 'rand_pushes': False,
 'rand_samples': False,
 'rand_scale_max': 1.4,
 'rand_scale_min': 0.6,
 'rand_scale_step': 0.1,
 'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
 'special_crop': False,
 'test_scales': [1],
 'top_k': 100,
 'weight_exp': 8}
len of db: 73075
building model...
module_file: models.CornerNetPurePie
use kp pure pie
total parameters: 198592652
setting learning rate to: 0.00025
training start...
start prefetching data...
shuffling indices...
['read.txt']
0%|                                                                             | 0/50000 [00:00<?, ?it/s]

But for some reason, the training is hanging without any progress, I check the CPU usage and it has a zombie process as below: image

The GPU usage is below: image

Since I do not have an Azure account, I commented on code on file: "/DeepRule/models/CornerNetPurePie.py” at line 32: # from azureml.core.compute import ComputeTarget

I do not know what is the main reason for this. Please help us. Thank you in advance!

soap117 commented 1 year ago

The OCR is replacable. You can replce it with some local OCR package https://pypi.org/project/pytesseract/ However you need to rewrite the ocr_result function

sdh5349 commented 1 year ago

Can you tell me in more detail?

kpostnov commented 7 months ago

Hey @tairen99, have you found the reason for this issue? We encountered the same problems.

Edit: By stepping through the execution we could pinpoint the code responsible for the process being stuck. The issue appears to be image = cv2.resize(image, (new_width, new_height)) on line 30 in sample/bar.py (and in other files in the same directory accordingly). We ended up using the suggestions from this thread and inserted multiprocessing.set_start_method('spawn', force=True) at the beginning of train_chart.py. Afterwards, everything worked as expected.