Open elizlee opened 2 years ago
@elizlee could you record initial profiling results and speed? And then new speed after your changes? Would be helpful to see how much faster we've gotten, and if it's fast enough.
Processing five documents on CPU:
AMR: sentence-by-sentence; max_length=528 -- 9749.335 seconds
AMR: sentence-by-sentence; max_length=256 -- 5685.850 seconds
AMR: all at once; max_length=528 -- 8154.847 seconds
AMR: all at once; max_length=256 -- 2740.192 seconds
So, fastest speed is 9 min per doc. I'm not sure if we have 10000 docs or 1000 doc in the eval. But at 1000 docs, that's 6.25 days. Is that fast enough? (I really don't know, but faster is better, e.g. if an error occurs after 5 days, that will be painful.)
My questions would be:
If gpu doesn't help for some reason, could more cores on the cpu path help? Though the best path would be to find out why gpu does not help.
Line profile results, with code hacked to run only the first claim and then exit.
$ pip install line_profiler
+import line_profiler
+@profile
def get_linker_scores(
$ kernprof -lv cdse_covid/semantic_extraction/run_amr_parsing.py --input /nas/gaia/users/joelb/phase3_test_full/cdse_output/topics.zip --output amr.zip --amr-parser-model /nas/gaia/users/joelb/views/cdse-covid/transition-amr-parser --max-tokens 50 --state-dict /nas/gaia/lestat/shared/wikidata_classifier.state_dict --domain covid --device cuda
Using cuda:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
49 @profile
50 def get_linker_scores(
51 event_description: str,
52 use_title: bool,
53 candidates: List[MutableMapping[str, Any]],
54 linking_model: WikidataLinkingClassifier,
55 device: str = CPU,
56 ) -> Any:
57 """Gets predictions from Wikidata linking classification model, given a string and candidate JSONs.
58
59 Returns:
60 A JSON response.
61 """
62 24 48.0 2.0 0.0 i = 0
63 24 31.0 1.3 0.0 scores = []
64 24 23.0 1.0 0.0 candidate_descriptions = []
65 1611 1282.0 0.8 0.0 for candidate in candidates:
66 1587 1711.0 1.1 0.0 description = candidate["description"][0] if candidate["description"] else ""
67 1587 1600.0 1.0 0.0 label = candidate["label"][0] if candidate["label"] else ""
68 1587 1187.0 0.7 0.0 if use_title:
69 description = f"{label} - {description}"
70 1587 1551.0 1.0 0.0 candidate_descriptions.append(description)
71 229 321.0 1.4 0.0 while i * MAX_BATCH_SIZE < len(candidates):
72 205 316.0 1.5 0.0 candidate_batch = candidate_descriptions[i * MAX_BATCH_SIZE : (i + 1) * MAX_BATCH_SIZE]
73 205 658.0 3.2 0.0 with torch.no_grad():
74 logits = (
75 linking_model.infer(event_description, candidate_batch)[0].detach()
76 205 5378315.0 26235.7 18.1 if device == CUDA
77 else linking_model.infer(event_description, candidate_batch)[0].detach().cpu()
78 )
79 205 6342.0 30.9 0.0 candidate_batch_scores = SOFTMAX(logits)[:, 2] # get "entailment" score from model
80 1792 8394.0 4.7 0.0 for candidate_batch_score in candidate_batch_scores:
81 1587 24353704.0 15345.7 81.8 scores.append(candidate_batch_score.item())
82 205 1111.0 5.4 0.0 i += 1
83 24 23.0 1.0 0.0 return {"scores": scores}
The same, but using cpu:
$ kernprof -lv cdse_covid/semantic_extraction/run_amr_parsing.py --input /nas/gaia/users/joelb/phase3_test_full/cdse_output/topics.zip --output amr.zip --amr-parser-model /nas/gaia/users/joelb/views/cdse-covid/transition-amr-parser --max-tokens 50 --state-dict /nas/gaia/lestat/shared/wikidata_classifier.state_dict --domain covid --device cpu
Line # Hits Time Per Hit % Time Line Contents
==============================================================
49 @profile
50 def get_linker_scores(
51 event_description: str,
52 use_title: bool,
53 candidates: List[MutableMapping[str, Any]],
54 linking_model: WikidataLinkingClassifier,
55 device: str = CPU,
56 ) -> Any:
57 """Gets predictions from Wikidata linking classification model, given a string and candidate JSONs.
58
59 Returns:
60 A JSON response.
61 """
62 12 35.0 2.9 0.0 i = 0
63 12 18.0 1.5 0.0 scores = []
64 12 15.0 1.2 0.0 candidate_descriptions = []
65 77 94.0 1.2 0.0 for candidate in candidates:
66 65 103.0 1.6 0.0 description = candidate["description"][0] if candidate["description"] else ""
67 65 91.0 1.4 0.0 label = candidate["label"][0] if candidate["label"] else ""
68 65 73.0 1.1 0.0 if use_title:
69 description = f"{label} - {description}"
70 65 83.0 1.3 0.0 candidate_descriptions.append(description)
71 26 50.0 1.9 0.0 while i * MAX_BATCH_SIZE < len(candidates):
72 14 32.0 2.3 0.0 candidate_batch = candidate_descriptions[i * MAX_BATCH_SIZE : (i + 1) * MAX_BATCH_SIZE]
73 14 186.0 13.3 0.0 with torch.no_grad():
74 logits = (
75 linking_model.infer(event_description, candidate_batch)[0].detach()
76 14 22.0 1.6 0.0 if device == CUDA
77 14 141131061.0 10080790.1 100.0 else linking_model.infer(event_description, candidate_batch)[0].detach().cpu()
78 )
79 14 850.0 60.7 0.0 candidate_batch_scores = SOFTMAX(logits)[:, 2] # get "entailment" score from model
80 79 625.0 7.9 0.0 for candidate_batch_score in candidate_batch_scores:
81 65 271.0 4.2 0.0 scores.append(candidate_batch_score.item())
82 14 75.0 5.4 0.0 i += 1
83 12 17.0 1.4 0.0 return {"scores": scores}
When using cpu, the inference takes 100% of the time. However, using gpu, inference is taking 18% of the time, and 82% is taken by a python loop transferring items one at a time from gpu to cpu. I think we should try to understand that line (81 above, and copied here):
79 205 6342.0 30.9 0.0 candidate_batch_scores = SOFTMAX(logits)[:, 2] # get "entailment" score from model
80 1792 8394.0 4.7 0.0 for candidate_batch_score in candidate_batch_scores:
81 1587 24353704.0 15345.7 81.8 scores.append(candidate_batch_score.item())
82 205 1111.0 5.4 0.0 i += 1
In terms of are the new changes correct, I tested to see if we get the same output as we used to:
$ export PYTHONPATH=.
$ git checkout master
# apply patch to break after first claim processed
$ python cdse_covid/semantic_extraction/run_amr_parsing.py \
--input /nas/gaia/users/joelb/phase3_test_full/cdse_output/topics.zip \
--output amr-master.zip \
--amr-parser-model /nas/gaia/users/joelb/views/cdse-covid/transition-amr-parser \
--max-tokens 50 \
--state-dict /nas/gaia/lestat/shared/wikidata_classifier.state_dict \
--domain covid \
--device cuda
$ git checkout restructure-amr-parsing
# apply patch to break after first claim processed
$ python cdse_covid/semantic_extraction/run_amr_parsing.py \
--input /nas/gaia/users/joelb/phase3_test_full/cdse_output/topics.zip \
--output amr-restructure-amr-parsing.zip \
--amr-parser-model /nas/gaia/users/joelb/views/cdse-covid/transition-amr-parser \
--max-tokens 50 \
--state-dict /nas/gaia/lestat/shared/wikidata_classifier.state_dict \
--domain covid \
--device cuda
The output is in a Parameters key/value store:
$ ls -l *.zip
-rw-r--r-- 1 joelb saga_users 59656 Apr 29 09:25 amr-master.zip
-rw-r--r-- 1 joelb saga_users 59656 Apr 29 09:04 amr-restructure-amr-parsing.zip
Same size, so that's promising, but they are not identical with binary compare:
$ diff amr-master.zip amr-restructure-amr-parsing.zip
Binary files amr-master.zip and amr-restructure-amr-parsing.zip differ
But the size and "key" of each entry is the same:
$ unzip -l amr-master.zip | head
Archive: amr-master.zip
Length Date Time Name
--------- ---------- ----- ----
7068 04-29-2022 09:25 d8aa6397
1283 04-29-2022 09:25 68f089b2
1580 04-29-2022 09:25 3e7748c6
1687 04-29-2022 09:25 78c0b308
1886 04-29-2022 09:25 25ceafda
1094 04-29-2022 09:25 789c9cd4
1405 04-29-2022 09:25 b0d50836
$ diff <(unzip -l amr-master.zip | perl -ne 'print "$1,$2\n" if /^\s*(\d+).+?(\w+?)$/') <(unzip -l amr-restructure-amr-parsing.zip | perl -ne 'print "$1,$2\n" if /^\s*(\d+).+?(\w+?)$/')
No diffs, so I'm assuming we have identical results.
Most of the time is spent on the Wikidata linking. It would be nice if we could locate why it is taking so long and if there's anything we could do about it.