Closed nmiculinic closed 6 years ago
For some reason, merging master brach commit:114bc25 breaks the code...investigating
After investigation I've concluded:
Chiron 0.4 master -- commit a6d0284363100d68d3183eb56092986cb0b6d691 base calling trunk is broken
TF_CPP_MIN_LOG_LEVEL=2 python chiron/entry.py call -e fasta --beam 0 --batch_size 25 -i chiron/example_data/ -o /tmp/ch3
The following error appears:
Traceback (most recent call last):
File "chiron/entry.py", line 96, in <module>
main()
File "chiron/entry.py", line 88, in main
args.func(args)
File "chiron/entry.py", line 25, in evaluation
chiron_eval.run(args)
File "/home/lpp/Desktop/Chiron/chiron/chiron_eval.py", line 299, in run
time_dict = unix_time(evaluation)
File "/home/lpp/Desktop/Chiron/chiron/utils/unix_time.py", line 21, in unix_time
function(*args, **kwargs)
File "/home/lpp/Desktop/Chiron/chiron/chiron_eval.py", line 235, in evaluation
if FLAGS.mode == 'rna':
AttributeError: 'Namespace' object has no attribute 'mode'
TF_CPP_MIN_LOG_LEVEL=2 python chiron/chiron_eval.py -e fasta --beam 0 --batch_size 25 -i chiron/example_data/ -o /tmp/ch3
and new errror pops out:
Traceback (most recent call last):
File "chiron/chiron_eval.py", line 343, in <module>
run(args)
File "chiron/chiron_eval.py", line 299, in run
time_dict = unix_time(evaluation)
File "/home/lpp/Desktop/Chiron/chiron/utils/unix_time.py", line 21, in unix_time
function(*args, **kwargs)
File "chiron/chiron_eval.py", line 211, in evaluation
saver.restore(sess, tf.train.latest_checkpoint(FLAGS.model))
File "/home/lpp/Desktop/Chiron/.venv/lib/python3.6/site-packages/tensorflow-1.8.0rc1-py3.6-linux-x86_64.egg/tensorflow/python/training/saver.py", line 1796, in restore
raise ValueError("Can't load save_path when it is None.")
ValueError: Can't load save_path when it is None.
Here I gave up. Also, I've noticed master has deleted tqdm, which is quite useful to see how the progress is going on. This PR speeds up the basecalling by 2-3 times (maybe more, further testing needed) on heterogenous computing environment (CPU+GPU)
Thanks a lot for the help! I will check it for merging. We have been trying to do this parallel before, glad you make it. I have done some benchmark:
Speed test for beam search decoder: beam 0 1 2 3 5 10 mean 1.522 1.737 1.982 2.212 2.581 3.461 std 0.028 0.152 0.157 0.161 0.159 0.160
0.194 s/ beam-width*batch(3000*512)*1 CPU
Solution: Use GPU to call Neural network and CPU to call Beam search. *1080Ti + ~ 8 beam width CPU number**
A ideal setting would be 1080Ti + 4 CPUs with 30 Beam width
However, beam search decoder in Tensorflow does not support multiple threading: https://github.com/tensorflow/tensorflow/issues/17136 So I am still waiting TF to enable it, but that's awesome you have made it work, really appreciate it.
It works by parallizing logit decoding step from logit generation. Logit decoding step (beam/greedy) is hard for GPU, and it's performed on CPU.
To get into more detail, previous code first performed calculation from raw signal to logits on GPU, copied data to RAM, copied logits into python space (( to numpy array )), copied logits back into tensorflow space, executed decoding step on CPU, and all of this sequentially. I've noticed poor CPU and GPU utilization with this approach.
This speedup does the following. It sets up tensorflow logits and decoding queue which hold results from raw signal->logits(GPU bound) and logits->decoded(CPU bound) steps. By decoupling and deserializing this operation, that is making a pipeline, GPU & CPU can yield to better saturation and utilization. If logit queue raises, [[ seen as logits_q in "signal processing" progress bar ]], it means you're CPU bound (( or not enough threads for decode_queue )). Otherwise, you're probably GPU bound and should check via common tools. For my Linux workstation and NVIDIA GPUs I use the following commands:
Here are the results for tests on Titan Xp for some reads I had GPU --> ~23s per iteration (( this improvement )) GPU --> ~96s per iteration
Test plan: I run example data through basecalled before/after: docker run --rm -it -v $(pwd)/thread:/tmp nmiculinic/chiron:thread chiron call --batch_size 1000 -i /opt/chiron/chiron/example_data -o /tmp
And compared sha512 sums: