Open titanism opened 2 years ago
:tada: Welcome! :tada:
And thank you for opening your first issue! We will get back to you shortly. :runner: :dash:
Doing a review and will submit a PR to contractions.json
with changes.
Caught some interesting bugs like "what's": "what has/is",
in the JSON (which is obviously a bug).
The other question I wanted to raise is that we should probably handle ’
and ‘
and ’
interchangeably somehow.
Re: missing contractions. Some of the entries in your list are already present in the contractions file. E.g., wouldn't've
, mightn't've
.
Re: fancy apostrophe. That should be possible to handle in the @stdlib/nlp/tokenize
package.
I'm about to submit a PR, one moment @kgryte
See https://github.com/stdlib-js/stdlib/pull/497
cc @kgryte
@titanism One recent update: @Planeshifter added initial support for expanding acronyms (see https://github.com/stdlib-js/stdlib/tree/c624a5eb4bca8f4f3d45e01bcc4eeee41652e3ba/lib/node_modules/%40stdlib/nlp/expand-acronyms). This may help to avoid mixing contraction/acronym concerns.
Description
We're writing as we found your library to be the most tested and fastest for expanding contractions. For context, we're working on https://spamscanner.net and expanding contractions before passing to tokenizers for spam classification.
To clarify, this is with regards to the generated codebase https://github.com/stdlib-js/nlp-expand-contractions from the source at https://github.com/stdlib-js/stdlib/tree/develop/lib/node_modules/%40stdlib/nlp/expand-contractions.
We noticed that your library is missing quite a few contractions in English, and could also benefit from contractions from other languages too (perhaps with an option).
While we can open a PR, we wanted to check to see what your thoughts were on this and how you might want the PR to look like (integration wise; e.g. new options?).
Here is our current compiled list of research and findings:
Related Issues
No response
Questions
No response
Other
No response
Checklist
RFC:
.