AMR parsing step is slow

elizlee commented 2 years ago

Most of the time is spent on the Wikidata linking. It would be nice if we could locate why it is taking so long and if there's anything we could do about it.

joelb-git commented 2 years ago

@elizlee could you record initial profiling results and speed? And then new speed after your changes? Would be helpful to see how much faster we've gotten, and if it's fast enough.

elizlee commented 2 years ago

Processing five documents on CPU:

AMR: sentence-by-sentence; max_length=528 -- 9749.335 seconds
AMR: sentence-by-sentence; max_length=256 -- 5685.850 seconds
AMR: all at once; max_length=528 -- 8154.847 seconds
AMR: all at once; max_length=256 -- 2740.192 seconds

joelb-git commented 2 years ago

So, fastest speed is 9 min per doc. I'm not sure if we have 10000 docs or 1000 doc in the eval. But at 1000 docs, that's 6.25 days. Is that fast enough? (I really don't know, but faster is better, e.g. if an error occurs after 5 days, that will be painful.)

My questions would be:

does it go faster with gpu?
if yes, can we increase batch size to go faster?
if batch size too big causes OOM, try an lg machine with a bigger gpu
i don't know what gpu sizes will be allowed by CACI

If gpu doesn't help for some reason, could more cores on the cpu path help? Though the best path would be to find out why gpu does not help.

joelb-git commented 2 years ago

Line profile results, with code hacked to run only the first claim and then exit.

$ pip install line_profiler

+import line_profiler
+@profile
 def get_linker_scores(

$ kernprof -lv cdse_covid/semantic_extraction/run_amr_parsing.py --input /nas/gaia/users/joelb/phase3_test_full/cdse_output/topics.zip --output amr.zip --amr-parser-model /nas/gaia/users/joelb/views/cdse-covid/transition-amr-parser --max-tokens 50 --state-dict /nas/gaia/lestat/shared/wikidata_classifier.state_dict --domain covid --device cuda

Using cuda:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    49                                           @profile
    50                                           def get_linker_scores(
    51                                               event_description: str,
    52                                               use_title: bool,
    53                                               candidates: List[MutableMapping[str, Any]],
    54                                               linking_model: WikidataLinkingClassifier,
    55                                               device: str = CPU,
    56                                           ) -> Any:
    57                                               """Gets predictions from Wikidata linking classification model, given a string and candidate JSONs.
    58
    59                                               Returns:
    60                                                   A JSON response.
    61                                               """
    62        24         48.0      2.0      0.0      i = 0
    63        24         31.0      1.3      0.0      scores = []
    64        24         23.0      1.0      0.0      candidate_descriptions = []
    65      1611       1282.0      0.8      0.0      for candidate in candidates:
    66      1587       1711.0      1.1      0.0          description = candidate["description"][0] if candidate["description"] else ""
    67      1587       1600.0      1.0      0.0          label = candidate["label"][0] if candidate["label"] else ""
    68      1587       1187.0      0.7      0.0          if use_title:
    69                                                       description = f"{label} - {description}"
    70      1587       1551.0      1.0      0.0          candidate_descriptions.append(description)
    71       229        321.0      1.4      0.0      while i * MAX_BATCH_SIZE < len(candidates):
    72       205        316.0      1.5      0.0          candidate_batch = candidate_descriptions[i * MAX_BATCH_SIZE : (i + 1) * MAX_BATCH_SIZE]
    73       205        658.0      3.2      0.0          with torch.no_grad():
    74                                                       logits = (
    75                                                           linking_model.infer(event_description, candidate_batch)[0].detach()
    76       205    5378315.0  26235.7     18.1                  if device == CUDA
    77                                                           else linking_model.infer(event_description, candidate_batch)[0].detach().cpu()
    78                                                       )
    79       205       6342.0     30.9      0.0              candidate_batch_scores = SOFTMAX(logits)[:, 2]  # get "entailment" score from model
    80      1792       8394.0      4.7      0.0              for candidate_batch_score in candidate_batch_scores:
    81      1587   24353704.0  15345.7     81.8                  scores.append(candidate_batch_score.item())
    82       205       1111.0      5.4      0.0              i += 1
    83        24         23.0      1.0      0.0      return {"scores": scores}

joelb-git commented 2 years ago

The same, but using cpu:

$ kernprof -lv cdse_covid/semantic_extraction/run_amr_parsing.py --input /nas/gaia/users/joelb/phase3_test_full/cdse_output/topics.zip --output amr.zip --amr-parser-model /nas/gaia/users/joelb/views/cdse-covid/transition-amr-parser --max-tokens 50 --state-dict /nas/gaia/lestat/shared/wikidata_classifier.state_dict --domain covid --device cpu

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    49                                           @profile
    50                                           def get_linker_scores(
    51                                               event_description: str,
    52                                               use_title: bool,
    53                                               candidates: List[MutableMapping[str, Any]],
    54                                               linking_model: WikidataLinkingClassifier,
    55                                               device: str = CPU,
    56                                           ) -> Any:
    57                                               """Gets predictions from Wikidata linking classification model, given a string and candidate JSONs.
    58
    59                                               Returns:
    60                                                   A JSON response.
    61                                               """
    62        12         35.0      2.9      0.0      i = 0
    63        12         18.0      1.5      0.0      scores = []
    64        12         15.0      1.2      0.0      candidate_descriptions = []
    65        77         94.0      1.2      0.0      for candidate in candidates:
    66        65        103.0      1.6      0.0          description = candidate["description"][0] if candidate["description"] else ""
    67        65         91.0      1.4      0.0          label = candidate["label"][0] if candidate["label"] else ""
    68        65         73.0      1.1      0.0          if use_title:
    69                                                       description = f"{label} - {description}"
    70        65         83.0      1.3      0.0          candidate_descriptions.append(description)
    71        26         50.0      1.9      0.0      while i * MAX_BATCH_SIZE < len(candidates):
    72        14         32.0      2.3      0.0          candidate_batch = candidate_descriptions[i * MAX_BATCH_SIZE : (i + 1) * MAX_BATCH_SIZE]
    73        14        186.0     13.3      0.0          with torch.no_grad():
    74                                                       logits = (
    75                                                           linking_model.infer(event_description, candidate_batch)[0].detach()
    76        14         22.0      1.6      0.0                  if device == CUDA
    77        14  141131061.0 10080790.1    100.0                  else linking_model.infer(event_description, candidate_batch)[0].detach().cpu()
    78                                                       )
    79        14        850.0     60.7      0.0              candidate_batch_scores = SOFTMAX(logits)[:, 2]  # get "entailment" score from model
    80        79        625.0      7.9      0.0              for candidate_batch_score in candidate_batch_scores:
    81        65        271.0      4.2      0.0                  scores.append(candidate_batch_score.item())
    82        14         75.0      5.4      0.0              i += 1
    83        12         17.0      1.4      0.0      return {"scores": scores}

joelb-git commented 2 years ago

When using cpu, the inference takes 100% of the time. However, using gpu, inference is taking 18% of the time, and 82% is taken by a python loop transferring items one at a time from gpu to cpu. I think we should try to understand that line (81 above, and copied here):

    79       205       6342.0     30.9      0.0              candidate_batch_scores = SOFTMAX(logits)[:, 2]  # get "entailment" score from model
    80      1792       8394.0      4.7      0.0              for candidate_batch_score in candidate_batch_scores:
    81      1587   24353704.0  15345.7     81.8                  scores.append(candidate_batch_score.item())
    82       205       1111.0      5.4      0.0              i += 1

joelb-git commented 2 years ago

In terms of are the new changes correct, I tested to see if we get the same output as we used to:

$ export PYTHONPATH=.

$ git checkout master
# apply patch to break after first claim processed
$ python cdse_covid/semantic_extraction/run_amr_parsing.py \
--input /nas/gaia/users/joelb/phase3_test_full/cdse_output/topics.zip \
--output amr-master.zip \
--amr-parser-model /nas/gaia/users/joelb/views/cdse-covid/transition-amr-parser \
--max-tokens 50 \
--state-dict /nas/gaia/lestat/shared/wikidata_classifier.state_dict \
--domain covid \
--device cuda

$ git checkout restructure-amr-parsing
# apply patch to break after first claim processed
$ python cdse_covid/semantic_extraction/run_amr_parsing.py \
--input /nas/gaia/users/joelb/phase3_test_full/cdse_output/topics.zip \
--output amr-restructure-amr-parsing.zip \
--amr-parser-model /nas/gaia/users/joelb/views/cdse-covid/transition-amr-parser \
--max-tokens 50 \
--state-dict /nas/gaia/lestat/shared/wikidata_classifier.state_dict \
--domain covid \
--device cuda

The output is in a Parameters key/value store:

$ ls -l *.zip
-rw-r--r-- 1 joelb saga_users 59656 Apr 29 09:25 amr-master.zip
-rw-r--r-- 1 joelb saga_users 59656 Apr 29 09:04 amr-restructure-amr-parsing.zip

Same size, so that's promising, but they are not identical with binary compare:

$ diff amr-master.zip amr-restructure-amr-parsing.zip
Binary files amr-master.zip and amr-restructure-amr-parsing.zip differ

But the size and "key" of each entry is the same:

$ unzip -l amr-master.zip  | head
Archive:  amr-master.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
     7068  04-29-2022 09:25   d8aa6397
     1283  04-29-2022 09:25   68f089b2
     1580  04-29-2022 09:25   3e7748c6
     1687  04-29-2022 09:25   78c0b308
     1886  04-29-2022 09:25   25ceafda
     1094  04-29-2022 09:25   789c9cd4
     1405  04-29-2022 09:25   b0d50836

$ diff <(unzip -l amr-master.zip | perl -ne 'print "$1,$2\n" if /^\s*(\d+).+?(\w+?)$/') <(unzip -l amr-restructure-amr-parsing.zip | perl -ne 'print "$1,$2\n" if /^\s*(\d+).+?(\w+?)$/')

No diffs, so I'm assuming we have identical results.

isi-vista / cdse-covid

AMR parsing step is slow #195