loretoparisi / fasttext.js

FastText for Node.js
MIT License
192 stars 28 forks source link

Load into Memory #1

Closed myoldusername closed 6 years ago

myoldusername commented 6 years ago

Dear @loretoparisi I installed your fasttext.js in order to solve memory problem that we discus about in https://github.com/facebookresearch/fastText/issues/276#issuecomment-337320249

Now when i run : node fasttext_predict.js it take like 5 sec to load the module,

"use strict";

(function() {

var DATA_ROOT='./data';

var FastText = require('./fasttext.js/lib/index');
var fastText = new FastText({
    loadModel: DATA_ROOT + '/model_gender.bin' // must specifiy filename and ext
});

var sample="Bashar Al Masri";
fastText.load()
.then(done => {
    return fastText.predict(sample);
})
.then(labels=> {
    console.log("TEXT:", sample, "\nPREDICT:",labels );
    sample="Hisahm al mjude";
    return fastText.predict(sample);
})
.then(labels=> {
    console.log("TEXT:", sample, "\nPREDICT:",labels );
   fastText.unload();
})
.catch(error => {
    console.error("predict error",error);
});

}).call(this);

and It return to stdout the prediction and exit , due to fastText.unload(); Now i need to call this file "node fasttext_predict.js UserName" from any place passing some args [UserName] to it and return to the stdout the result directly , since you saide it will be loaded into memory , in order to be able to get this result from the php webserver.

It is the same problem with the C++ file loading , i need it to be run in the background !

loretoparisi commented 6 years ago

@myoldusername I have just updated the library with several improvements for the child process run. I have also added a server example that will help your needs. Please check the README.

myoldusername commented 6 years ago

Upstanding.... I will test it today since i am out of town, today i will give you a feedback.

You are awesome......

myoldusername commented 6 years ago

It is working as expected , THANK YOU SO MUCH . You made my day !

Can i send you a donation ?

myoldusername commented 6 years ago

Some times it crach when i pass the string text if the string is unicode, like Chinese

I advice you to add normalization method to remove all non characters, e.g all special characters and smiles characters...

loretoparisi commented 6 years ago

@myoldusername yes this is a good point there are minor functions in utils like Util.removeDiacritics here https://github.com/loretoparisi/fasttext.js/blob/master/lib/util.js#L238

and the dataset is normalized in FastText.normalize https://github.com/loretoparisi/fasttext.js/blob/master/lib/index.js#L438

but of course for symbolics languages it's different, since it must be handled with Unicode i.e. unicode conversion and normalization before prediction. Be aware that this normalization must be done on the training set too i.e. you have to apply the same normalization to training/test set and to the sample for the inference.

In my backend I do unicode normalization in Java, but here I would prefer a node solution. Will look into!

myoldusername commented 6 years ago

Well i am working with language classification training set which provide by fastText with respect to them.

I use to pass some languages paragraphs to the localhost url it works, but some time it suddenly crashed even with normalized strings.. I am not sure i will make farther test to see if my copy paste string has some hidden characters.. Since unicode has some nasty stuff lol.

Regarding node solution, i think it will be an awesome idea to apply.

With respect.

Yours

loretoparisi commented 6 years ago

Yes this could be a very tricky task when dealing with languages that needs Unicode. By the way I'm using the same model too, so I have added the compressed version of the model in the example, and some env var so that you can go:

cd examples/
export MODEL=./data/lid.176.ftz 
export PORT=9001
node server

and then

http://localhost:9001/?text=%EB%9E%84%EB%9E%84%EB%9D%BC%20%EC%B0%A8%EC%B0%A8%EC%B0%A8%20%EB%9E%84%EB%9E%84%EB%9D%BC\n%EB%9E%84%EB%9E%84%EB%9D%BC%20%EC%B0%A8%EC%B0%A8%EC%B0%A8%20%EC%9E%A5%EC%9C%A4%EC%A0%95%20%ED%8A%B8%EC%9C%84%EC%8A%A4%ED%8A%B8%20%EC%B6%A4%EC%9D%84%20%EC%B6%A5%EC%8B%9C%EB%8B%A4

that will be correctly detected as KO:

{
    "response_time": 0.001,
    "predict": [{
            "label": "KO",
            "score": "1"
        },
        {
            "label": "TR",
            "score": "1.95313E-08"
        }
    ]
}

NOTE My input text was 랄랄라%20차차차%20랄랄라\n랄랄라%20차차차%20장윤정%20트위스트%20춤을%20춥시다, but when you put in a url it will be automatically encoded with the encodeUriComponent method.

myoldusername commented 6 years ago

Well i like to bring to your attention that sometime when i pass a regular string, for unknown reasons the node server file freeze and i have to kill it and restart it again..

loretoparisi commented 6 years ago

@myoldusername put here that text and the url as cut&paste from the browser

myoldusername commented 6 years ago

http://localhost:3030/?text=bader

loretoparisi commented 6 years ago

uhm I guess you have some issues in your env:

$ export PORT=3030
$ export MODEL=./data/lid.176.ftz 
$ node server.js 
model loaded
server is listening on 3030

you therefore call http://localhost:3030/?text=bader and you get:

{
    response_time: 0.002,
    predict: [{
            label: "EN",
            score: "0.125931"
        },
        {
            label: "CA",
            score: "0.0847617"
        }
    ]
}

This should work without any issues:

$ time curl -s "http://localhost:3030/?text=bader"
{
  "response_time": 0,
  "predict": [
    {
      "label": "EN",
      "score": "0.125931"
    },
    {
      "label": "CA",
      "score": "0.0847617"
    }
  ]
}
real    0m0.027s
user    0m0.005s
sys 0m0.006s

and now we do some benchmarking as well calling 1, 10 and 100 times iteratively:

$ ab -n 1 "http://localhost:3030/?text=bader"
This is ApacheBench, Version 2.3 <$Revision: 1757674 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done

Server Software:        
Server Hostname:        localhost
Server Port:            3030

Document Path:          /?text=bader
Document Length:        164 bytes

Concurrency Level:      1
Time taken for tests:   0.001 seconds
Complete requests:      1
Failed requests:        0
Total transferred:      271 bytes
HTML transferred:       164 bytes
Requests per second:    712.76 [#/sec] (mean)
Time per request:       1.403 [ms] (mean)
Time per request:       1.403 [ms] (mean, across all concurrent requests)
Transfer rate:          188.63 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     1    1   0.0      1       1
Waiting:        1    1   0.0      1       1
Total:          1    1   0.0      1       1
[loretoparisi@:mbploreto task]$ ab -n 10 "http://localhost:3030/?text=bader"
This is ApacheBench, Version 2.3 <$Revision: 1757674 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done

Server Software:        
Server Hostname:        localhost
Server Port:            3030

Document Path:          /?text=bader
Document Length:        164 bytes

Concurrency Level:      1
Time taken for tests:   0.011 seconds
Complete requests:      10
Failed requests:        4
   (Connect: 0, Receive: 0, Length: 4, Exceptions: 0)
Total transferred:      2726 bytes
HTML transferred:       1656 bytes
Requests per second:    941.00 [#/sec] (mean)
Time per request:       1.063 [ms] (mean)
Time per request:       1.063 [ms] (mean, across all concurrent requests)
Transfer rate:          250.50 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    1   0.5      1       2
Waiting:        0    1   0.3      1       1
Total:          1    1   0.5      1       2

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      1
  90%      2
  95%      2
  98%      2
  99%      2
 100%      2 (longest request)
[loretoparisi@:mbploreto task]$ ab -n 100 "http://localhost:3030/?text=bader"
This is ApacheBench, Version 2.3 <$Revision: 1757674 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done

Server Software:        
Server Hostname:        localhost
Server Port:            3030

Document Path:          /?text=bader
Document Length:        168 bytes

Concurrency Level:      1
Time taken for tests:   0.095 seconds
Complete requests:      100
Failed requests:        73
   (Connect: 0, Receive: 0, Length: 73, Exceptions: 0)
Total transferred:      27208 bytes
HTML transferred:       16508 bytes
Requests per second:    1054.37 [#/sec] (mean)
Time per request:       0.948 [ms] (mean)
Time per request:       0.948 [ms] (mean, across all concurrent requests)
Transfer rate:          280.15 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    1   1.2      0       9
Waiting:        0    1   1.2      0       9
Total:          0    1   1.2      1       9

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      1
  90%      2
  95%      3
  98%      7
  99%      9
 100%      9 (longest request)
loretoparisi commented 6 years ago

I have added here some benchmarkes therefore I'm closing this issue. Feel free to re-open it if you have any problem.