usc-isi-i2 / dig-crf

CRF++ extraction for DIG
Apache License 2.0
2 stars 1 forks source link

Performance #6

Open CraigMiloRogers opened 8 years ago

CraigMiloRogers commented 8 years ago

What is the performance of feature setup and CRF processing?

CraigMiloRogers commented 8 years ago

This test was performed with the present code (no iterator interface yet) and with the adjudicated samples as keyed JSON Lines. The first test uses 1 copy of the sample data as input; the second test uses 100 copies of the sample data, concatenated.

387% time ./applyCrfKj.csh
input:  992 sentences, 59520 tokens
output: 1493 phrases
8.479u 0.027s 0:08.59 98.8%     0+0k 0+320io 0pf+0w
388% time ./applyCrfKj-x100.csh
input:  99200 sentences, 5952000 tokens
output: 149300 phrases
972.821u 1.008s 16:19.88 99.3%  0+0k 0+31928io 0pf+0w
972.840u 1.015s 16:19.91 99.3%  0+0k 0+31928io 0pf+0w
389% 

Just under 10 seconds per kilosentence (although it's interesting that the 100x test was slower per sentence than the shorter test, perhaps because I did other things on the system while waiting for the test to complete). Only a single CPU was used by the application code. At this rate, 100 executors should process 100 million ads in under 170 minutes realtime (assuming equivalent CPU speeds to the test system, etc.)

There was no indication of a memory leak.

CraigMiloRogers commented 8 years ago

Using a Spark-friendly generator model:

390% time ./applyCrfKj.csh
input:  0 sentences, 0 tokens
output: 1493 phrases
8.393u 0.031s 0:08.50 99.0%     0+0k 0+320io 0pf+0w
390% time ./applyCrfKj.csh
input:  0 sentences, 0 tokens
output: 1493 phrases
8.407u 0.025s 0:08.49 99.1%     0+0k 0+312io 0pf+0w
390% time ./applyCrfKj.csh
input:  0 sentences, 0 tokens
output: 1493 phrases
8.391u 0.028s 0:08.51 98.8%     0+0k 0+344io 0pf+0w
390% 

It might be a little faster. Perhaps because I replaced string formatting with string concatenation?

CraigMiloRogers commented 8 years ago

Here are much better times on the 100x test. I wasn't using the system for anything else.

393% time ./applyCrfKj-x100.csh
input:  0 sentences, 0 tokens
output: 149300 phrases
849.288u 0.773s 14:16.46 99.2%  0+0k 0+32408io 0pf+0w
849.306u 0.782s 14:16.49 99.2%  0+0k 0+32408io 0pf+0w
394% !!
time ./applyCrfKj-x100.csh
input:  0 sentences, 0 tokens
output: 149300 phrases
842.149u 0.714s 14:08.92 99.2%  0+0k 0+32096io 0pf+0w
842.166u 0.723s 14:08.95 99.2%  0+0k 0+32096io 0pf+0w
394% !!
time ./applyCrfKj-x100.csh
input:  0 sentences, 0 tokens
output: 149300 phrases
835.464u 0.702s 14:02.15 99.2%  0+0k 0+31944io 0pf+0w
835.478u 0.714s 14:02.18 99.2%  0+0k 0+31944io 0pf+0w
394% 
CraigMiloRogers commented 8 years ago

I added the number of tagged output tokens to the statistics:

395% time ./applyCrfKj.csh
input:  992 sentences, 59520 tokens
output: 1493 phrases, 5316 tokens
8.435u 0.027s 0:08.59 98.3%     0+0k 0+312io 0pf+0w
395% 
CraigMiloRogers commented 8 years ago

Running the 100x test under Spark with 8 local partitions and executors, on my personal workstation, with a single input file and per-partition output files, yields:

12.216u 1.658s 2:37.93 8.7%     0+0k 0+31464io 0pf+0w

The user and systems time are low because they report only the driver process. The execution time is 2:37.93. Compare that to the non-Spark time of 14:02.15: the elapsed time speedup is over 5x.

CraigMiloRogers commented 8 years ago

My project Macbook, avatar.isi.edu, also has 8 logical cores. It executes the 100x test under Spark, with 8 local partitions and executors, in 2:43.347 elapsed time.