Error with multithreading train.py

brayanhenao commented 7 years ago

Hi, as you know the @mpgen in train.py file is generating some errors, so i commented it. The problem is that it took really long time, i use only CPU tensor flow version (i don't have an NVIDIA gpu, only AMD). For 54k batches it took around 4 days. Did someone know how to fix the mpgen error? PS: Excuse my english.

Weiguo2000 commented 7 years ago

Not quite sure the problem about @mpgen as you mentioned. but it takes several days to run train.py with CPU tensorflow, GPU will definitely speed up the training into several hours.

frischzenger commented 7 years ago

you must support the details of platform which you use. such as python2, or python3, windows or fedora? i have met this problem under windows platform, python3. after look the document of python official document, i found that python multiproccess must be used under global case which is different from linux enviroment.

brayanhenao commented 7 years ago

@frischzenger Hi!, i'm using Windows 10, Python 3.

brayanhenao commented 7 years ago

@Weiguo2000 This is the problem that i'm talking about.

D:\number_plate_recog\deep-anpr-master>train.py E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "BestSplits" device_type: "CPU"') for unknown op: BestSplits E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "CountExtremelyRandomStats" device_type: "CPU"') for unknown op: CountExtremelyRandomStats E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "FinishedNodes" device_type: "CPU"') for unknown op: FinishedNodes E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "GrowTree" device_type: "CPU"') for unknown op: GrowTree E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "ReinterpretStringToFloat" device_type: "CPU"') for unknown op: ReinterpretStringToFloat E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "SampleInputs" device_type: "CPU"') for unknown op: SampleInputs E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "ScatterAddNdim" device_type: "CPU"') for unknown op: ScatterAddNdim E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "TopNInsert" device_type: "CPU"') for unknown op: TopNInsert E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "TopNRemove" device_type: "CPU"') for unknown op: TopNRemove E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "TreePredictions" device_type: "CPU"') for unknown op: TreePredictions E c:\tf_jenkins\home\workspace\release-win\device\cpu\os\windows\tensorflow\core\framework\op_kernel.cc:943] OpKernel ('op: "UpdateFertileSlots" device_type: "CPU"') for unknown op: UpdateFertileSlots Traceback (most recent call last): File "D:\number_plate_recog\deep-anpr-master\train.py", line 267, in initial_weights=initial_weights) File "D:\number_plate_recog\deep-anpr-master\train.py", line 239, in train for batch_idx, (batch_xs, batch_ys) in batch_iter: File "D:\number_plate_recog\deep-anpr-master\train.py", line 103, in wrapped proc.start() File "C:\Program Files\Python35\lib\multiprocessing\process.py", line 105, in start self._popen = self._Popen(self) File "C:\Program Files\Python35\lib\multiprocessing\context.py", line 212, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\Program Files\Python35\lib\multiprocessing\context.py", line 313, in _Popen return Popen(process_obj) File "C:\Program Files\Python35\lib\multiprocessing\popen_spawn_win32.py", line 66, in init reduction.dump(process_obj, to_child) File "C:\Program Files\Python35\lib\multiprocessing\reduction.py", line 59, in dump ForkingPickler(file, protocol).dump(obj) AttributeError: Can't pickle local object 'mpgen..main'

D:\number_plate_recog\deep-anpr-master>Traceback (most recent call last): File "", line 1, in File "C:\Program Files\Python35\lib\multiprocessing\spawn.py", line 106, in spawn_main exitcode = _main(fd) File "C:\Program Files\Python35\lib\multiprocessing\spawn.py", line 116, in _main self = pickle.load(from_parent) EOFError: Ran out of input

Weiguo2000 commented 7 years ago

it looks like the problem you mentioned only occurs in Windows platform, I didn't have the problem in both ubuntu and Mac system.

sourabh2k15 commented 7 years ago

Yes I back this up , on windows the multiprocessing fails , commenting @mpgen is the only way out, but on linux this works out of the box right .

annalenastern commented 6 years ago

would commenting out @mpgen make the training sequential? like if I used a CPU? It runs for me like that, though (not sure if slow or fast in comparison: time for 60 batches 301.733... and on later batches 64.23...)

sourabh2k15 commented 6 years ago

Yup it makes it use only one core, instead of using all cores available. So even on a CPU this is slower as it is using only 1 core instead of 4 / n available.

matthewearl / deep-anpr

Error with multithreading train.py #37