Closed yenson-lau closed 2 years ago
Fixed. Lmk if it works for you. Also feel free to PR anything.
Thanks. Unfortunately I couldn't get detect_ner_with_hf_model()
to work with GPU. It's definitely not anything on my end. Can you check what's going on?
I'm calling
apply_anonymization("Bob and Amy are eating apples in Jack's home.", device="cuda")["text"]
>> RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
The apply_anonymization
function is pretty much the same as the one in your demo code:
def apply_anonymization(
sentence: str,
lang_id: str = "en",
context_window: int = 20,
anonymize_condition = True,
tag_type = {'IP_ADDRESS', 'KEY', 'ID', 'PHONE', 'USER', 'EMAIL', 'LICENSE_PLATE', 'PERSON'} ,
device: str = "cpu",
) -> str:
"""
Params:
==================
sentence: str, the sentence to be anonymized
lang_id: str, the language id of the sentence
context_window: int, the context window size
anonymize_condition: function, the anonymization condition
tag_type: iterable, the tag types of the anonymization. By default: {'IP_ADDRESS', 'KEY', 'ID', 'PHONE', 'USER', 'EMAIL', 'LICENSE_PLATE', 'PERSON'}
device: cpu or cuda:{device_id}
"""
if tag_type == None:
tag_type = regex_rulebase.keys()
lang_id = lang_id.split("_")[0]
ner_ids = detect_ner_with_regex_and_context(
sentence=sentence,
src_lang=lang_id,
context_window=context_window,
tag_type=tag_type,
)
ner_persons = detect_ner_with_hf_model(
sentence=sentence,
src_lang=lang_id,
device=device,
)
ner = list(set(ner_ids + ner_persons))
ner.sort(key=lambda a: a[1])
if anonymize_condition:
new_sentence, new_ner, _ = augment_anonymize(sentence, lang_id, ner, )
doc = {'text': new_sentence, 'ner': new_ner, 'orig_text': sentence, 'orig_ner': ner}
else:
new_sentence = sentence
doc = {'text': new_sentence, 'ner': ner}
return doc
Try it now. It was exepcting "cuda:0" or some other device number. I fixed it so it can accept just "cuda". This is in notebook: https://colab.research.google.com/drive/1olv6IMEP5SkwJb8CFyR2aZV19_9XdlZp#scrollTo=bsf83d-WFoNX
Yep this works, thanks for making the update!
https://github.com/piisa/muliwai/blob/main/ner_manager.py#L364 leads to invalid indentation