royashcenazi / parsigs

Parsigs is an open-source project that aims to extract the relevant dosage information from prescriptions text without compromising the patient׳s privacy.
MIT License
24 stars 11 forks source link

Form can be parsed as plurar #1

Open royashcenazi opened 1 year ago

royashcenazi commented 1 year ago

Currently, when parsing from a sig sentence its form, in case it is plural it will remain like this whereas it should be parsed as a singular form.

Example: "take 2 tablets of aderol every day" => StructuredSig(form = "tablets"...)

Possible solution: Calculate the Levinshtein distance between the parsed form to all possible outputs {capsule, tablet, drop, syringe, lotion ...}

itay-goldraich commented 1 year ago

I think this issue can be solved using the spacylibrary. I'll take this issue.

itay-goldraich commented 1 year ago

I saw that two other pull requests (PRs) were opened for this issue. I will not be continuing work on this issue. If one of the other contributors would like to use my code, I have left it here. It does not require any additional libraries, as it only uses spacy, which we have already imported. This code worked for me on my local machine (I was going to open a PR right now, but there is no need).

def _plural_to_singular(sig): output_words = [] for word in sig.split(): doc = nlp(word) for token in doc: if token.tag_ == 'NNS': # NNS: Noun, plural output_words.append(token.lemma_) else: output_words.append(word) return ' '.join(output_words)