makcedward / nlpaug

Data augmentation for NLP
https://makcedward.github.io/
MIT License
4.41k stars 460 forks source link

Not able to presever the words. #344

Open Opperessor opened 6 months ago

Opperessor commented 6 months ago

wanted to retain the format of few entities, unable to do.

text = "My phone number is +44(123)-1451-1231ext02 and my credit card is 1234-1235(2352), my policy number is 1234-12CAR-12 and my account number is 004421234-12"

def keyboard_aug():
    protected_entities = ["+44(123)-1451-1231ext02", "1234-1235(2352)", "1234-12CAR-12", "004421234-12"]
    aug = nac.KeyboardAug(stopwords=protected_entities)
    augmented_text = aug.augment(text)
    print("Original text:")
    print(text)
    print("Augmented text:")
    print(augmented_text)

output:
Original text:
My phone number is +44(123)-1451-1231ext02 and my credit card is 1234-1235(2352), my policy number is 1234-12CAR-12 and my account number is 004421234-12
Augmented text:
['My phone numneG is + 44 (123) - 1#R1 - 1231ext02 and my credit cx#d is 1234 - 1235 (2Et2 ), my 9Plicy n7kber is 12$# - 12CAR - 12 and my acc*ujR mumbdr is 00e4W123e - 12']

I am unable to perserve the format of the words. I did use the stopwords, still facing the issue. Very new to this nlpaug, A Little help will be appreciated.