winkjs / wink-eng-lite-web-model

English lite language model for Web Browsers
MIT License
11 stars 8 forks source link

Replaced NodeJs specific Buffer API GH-5 #6

Closed mrbrianevans closed 2 years ago

mrbrianevans commented 2 years ago

Replaced usage of Buffer.from with custom base64toBuffer function written by @sanjayaksaxena in GH-5.

This enables the module to be used in web browsers without transformation using a tool like browserify.

I have tested the change by putting the code in node_modules of a project which uses the english model. Before the change, there is an error message in the browser console:

Uncaught ReferenceError: Buffer is not defined
    at readModel (read-core-model.js:1:260)
    at Object.loadModel [as core] (load-core-model.js:1:78)
    at load (wink-nlp.js:137:22)
    at nlp (wink-nlp.js:408:3)

After the change, it functions as expected with no error messages.

CLAassistant commented 2 years ago

CLA assistant check
All committers have signed the CLA.

mrbrianevans commented 2 years ago

Fixes #5

mrbrianevans commented 2 years ago

Sorry an oversight on my part. I only tested the "hello world" example and it worked as expected, but should probably test the POS tagging or something to confirm that the language model lexicon has loaded correctly.

On Sun, Feb 6, 2022, 6:17 AM Sanjaya Kumar Saxena @.***> wrote:

@.**** commented on this pull request.

A specific conversion: model.lexicon = new Uint32Array( lexiconAsBuffer.buffer, 0, ( lexiconAsBuffer.length / 4 ) ); and model.xpansions = new Uint32Array( xpansionsAsBuffer.buffer, 0, ( xpansionsAsBuffer.length / 4 ) ); would be necessary as internally it assumes data to be available in Uint32Array format.

Has it worked with winkNLP properly?

— Reply to this email directly, view it on GitHub https://github.com/winkjs/wink-eng-lite-web-model/pull/6#pullrequestreview-873994146, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMVIGTCRD2DWTXFDSID3MM3UZYG6VANCNFSM5NUM4PTA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

sanjayaksaxena commented 2 years ago

Would recommend to incorporate the change and then test as internally it assumes Uint32Array.

mrbrianevans commented 2 years ago

Okay, I have done the conversion from ArrayBuffer to Uint32Array, and tested it and its working as expected. It can correctly tag parts of speech, so the lexicon must be working.

As an example, this is the output with the changes implemented:

token      p-spaces   prefix  suffix  shape   case    nerHint type     normal/pos 
——————————————————————————————————————————————————————————————————————————————————————— 
  “                 0   “       “       “       0       0       punctuat “ / PUNCT 
  He                0   He      He      Xx      3       0       word     he / PRON 
  is                1   is      is      xx      1       0       word     is / AUX 
  just              1   ju      ust     xxxx    1       0       word     just / ADV 
  what              1   wh      hat     xxxx    1       0       word     what / PRON 
  a                 1   a       a       x       1       0       word     a / DET 
  young             1   yo      ung     xxxx    1       0       word     young / ADJ 
  man               1   ma      man     xxx     1       0       word     man / NOUN 
  ought             1   ou      ght     xxxx    1       0       word     ought / AUX 
  to                1   to      to      xx      1       39      word     to / PART 
  be                1   be      be      xx      1       0       word     be / AUX 
  ,                 0   ,       ,       ,       0       0       punctuat , / PUNCT 
  ”                 0   ”       ”       ”       0       0       punctuat ” / PUNCT 

The rightmost column shows it correctly identified English words POS.

I have also added JSDoc on the function to make it clearer what type it returns.

Is this change a minor or patch version bump?

sanjayaksaxena commented 2 years ago

It is more like a patch unless you have another view?

mrbrianevans commented 2 years ago

No, patch seems good to me.

On Mon, Feb 7, 2022, 10:53 AM Sanjaya Kumar Saxena @.***> wrote:

It is more like a patch unless you have another view?

— Reply to this email directly, view it on GitHub https://github.com/winkjs/wink-eng-lite-web-model/pull/6#issuecomment-1031332762, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMVIGTBCB2GC6R5N6Y2HX3LUZ6QEDANCNFSM5NUM4PTA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>