loretoparisi / fasttext.js

FastText for Node.js
MIT License
192 stars 28 forks source link

Different prediction for the same keyword same model #4

Closed beshoo closed 6 years ago

beshoo commented 6 years ago

Dear author.

Thank you for this wonderful node add on. I have strange problem

When i test directe predictions with fasttext i mean without node, i have no problem.

But when i pass same keyword to the node server, each time i have different label different accuracy.

wget -qO- http://local host:3030/?text=beshoo

Each time i send this url i have different label.

Regards

loretoparisi commented 6 years ago

Hello @beshoo could you please try this

cd fasttext.js/examples
node train
node server

and then point your browser to http://localhost:3000/?text=beshoo you should get for the example model and dataset always this response like doing

$ curl "http://localhost:3000/?text=beshoo" | json_pp
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   174  100   174    0     0  27835      0 --:--:-- --:--:-- --:--:-- 29000
{
   "response_time" : 0.001,
   "predict" : [
      {
         "score" : "0.5",
         "label" : "BAND"
      },
      {
         "label" : "ORGANIZATION",
         "score" : "0.498047"
      }
   ]
}

as well as

wget -qO- http://localhost:3000/?text=beshoo
{
  "response_time": 0.001,
  "predict": [
    {
      "label": "BAND",
      "score": "0.5"
    },
    {
      "label": "ORGANIZATION",
      "score": "0.498047"
    }
  ]
}
beshoo commented 6 years ago

Well the problem i dont have the text labeled data, to retrain it.

But here is the bin module, this is gender classified model

http://beshoo.com/gender.bin.gz

Try it please...

beshoo commented 6 years ago
ot@server [/home/mybeshoo/www]# wget -qO- http://local
host:3300/?text=beshoo                                  
{                                                       
  "response_time": 4.909,                               
  "predict": [                                          
    {                                                   
      "label": "FEMALE",                                
      "score": "0.998047"                               
    },                                                  
    {                                                   
      "label": "MALE",                                  
      "score": "1.95313E-08"                            
    }                                                   
  ]                                                     
}

Now lets try again

r``` oot@server [/home/mybeshoo/www]# wget -qO- http://loca host:3300/?text=beshoo
{
"response_time": 1.156,
"predict": [
{
"label": "MALE",
"score": "0.794922"
},
{
"label": "FEMALE",
"score": "0.203125"
}
]
}

loretoparisi commented 6 years ago

@beshoo that sounds weird! Thanks I will take a look at the generated model. In the meanwhile I have tried a facebook pre-trained languages model and it seems okay:

cd examples/
export MODEL=data/lid.176.ftz
http://localhost:3000/?text=das%20is%20schon
{
   "predict" : [
      {
         "score" : "0.745016",
         "label" : "DE"
      },
      {
         "score" : "0.232697",
         "label" : "EN"
      }
   ],
   "response_time" : 0
}

both the quantized model and the non compressed one:

[loretoparisi@:mbploreto examples]$ ls -lh data/lid.176.ftz 
-rw-r--r--@ 1 loretoparisi  staff   916K 19 Ott 23:50 data/lid.176.ftz
[loretoparisi@:mbploreto examples]$ ls -lh /root/lid176_model.bin 
-rw-r--r--  1 loretoparisi  staff   125M 10 Ott 16:25 /root/lid176_model.bin

Going to check yours then. Are you running on mac/linux/windows? Also it seems that in your response there are very different response time from the two output "response_time": 4.909, and "response_time": 1.156, that seems to me strange.

beshoo commented 6 years ago

I am on Linux 😎

loretoparisi commented 6 years ago

@beshoo ok thanks going to test both macos and linux then again.

loretoparisi commented 6 years ago

@beshoo so I did the following

Created and built Dockerfile in this repo to check the linux version. I'm using Ubuntu16.04 here:

docker build -t fasttext.js -t

Tested your model against it:

docker run -v /models/:/models --rm -it -p 3000:3000 -e MODEL=/models/gender.bin fasttext.js node fasttext.js/examples/server.js 
[loretoparisi@:mbploreto fasttext.js]$ curl http://localhost:3000/?text=I%20love%cars | json_pp
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   169  100   169    0     0  24257      0 --:--:-- --:--:-- --:--:-- 28166
{
   "response_time" : 0,
   "predict" : [
      {
         "label" : "MALE",
         "score" : "0.855469"
      },
      {
         "label" : "FEMALE",
         "score" : "0.142578"
      }
   ]
}
[loretoparisi@:mbploreto fasttext.js]$ curl http://localhost:3000/?text=I%20love%dressing | json_pp
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   173  100   173    0     0  22718      0 --:--:-- --:--:-- --:--:-- 24714
{
   "predict" : [
      {
         "score" : "0.535156",
         "label" : "FEMALE"
      },
      {
         "score" : "0.462891",
         "label" : "MALE"
      }
   ],
   "response_time" : 0.001
}

and the same text multiple times as well:

[loretoparisi@:mbploreto fasttext.js]$ for ((n=0;n<10;n++)); do curl http://localhost:3000/?text=I%20love%dressing; done
{
  "response_time": 0.001,
  "predict": [
    {
      "label": "FEMALE",
      "score": "0.535156"
    },
    {
      "label": "MALE",
      "score": "0.462891"
    }
  ]
}{
  "response_time": 0.001,
  "predict": [
    {
      "label": "FEMALE",
      "score": "0.535156"
    },
    {
      "label": "MALE",
      "score": "0.462891"
    }
  ]
}{
  "response_time": 0.001,
  "predict": [
    {
      "label": "FEMALE",
      "score": "0.535156"
    },
    {
      "label": "MALE",
      "score": "0.462891"
    }
  ]
}{
  "response_time": 0,
  "predict": [
    {
      "label": "FEMALE",
      "score": "0.535156"
    },
    {
      "label": "MALE",
      "score": "0.462891"
    }
  ]
}{
  "response_time": 0,
  "predict": [
    {
      "label": "FEMALE",
      "score": "0.535156"
    },
    {
      "label": "MALE",
      "score": "0.462891"
    }
  ]
}{
  "response_time": 0,
  "predict": [
    {
      "label": "FEMALE",
      "score": "0.535156"
    },
    {
      "label": "MALE",
      "score": "0.462891"
    }
  ]
}{
  "response_time": 0,
  "predict": [
    {
      "label": "FEMALE",
      "score": "0.535156"
    },
    {
      "label": "MALE",
      "score": "0.462891"
    }
  ]
}{
  "response_time": 0.001,
  "predict": [
    {
      "label": "FEMALE",
      "score": "0.535156"
    },
    {
      "label": "MALE",
      "score": "0.462891"
    }
  ]
}{
  "response_time": 0.001,
  "predict": [
    {
      "label": "FEMALE",
      "score": "0.535156"
    },
    {
      "label": "MALE",
      "score": "0.462891"
    }
  ]
}{
  "response_time": 0.001,
  "predict": [
    {
      "label": "FEMALE",
      "score": "0.535156"
    },
    {
      "label": "MALE",
      "score": "0.462891"
    }
  ]
}

and in your example:

[loretoparisi@:mbploreto fasttext.js]$ for ((n=0;n<5;n++)); do curl http://localhost:3000/?text=beshoo; done
{
  "response_time": 0.001,
  "predict": [
    {
      "label": "FEMALE",
      "score": "0.736328"
    },
    {
      "label": "MALE",
      "score": "0.261719"
    }
  ]
}{
  "response_time": 0,
  "predict": [
    {
      "label": "FEMALE",
      "score": "0.736328"
    },
    {
      "label": "MALE",
      "score": "0.261719"
    }
  ]
}{
  "response_time": 0,
  "predict": [
    {
      "label": "FEMALE",
      "score": "0.736328"
    },
    {
      "label": "MALE",
      "score": "0.261719"
    }
  ]
}{
  "response_time": 0,
  "predict": [
    {
      "label": "FEMALE",
      "score": "0.736328"
    },
    {
      "label": "MALE",
      "score": "0.261719"
    }
  ]
}{
  "response_time": 0.001,
  "predict": [
    {
      "label": "FEMALE",
      "score": "0.736328"
    },
    {
      "label": "MALE",
      "score": "0.261719"
    }
  ]

Everything seems to work ok. Are you running any proxy before your web server listening on 3030? Also which version of linux are you running?

beshoo commented 6 years ago

No proxy at all Linux Server release 6.9

Btw, it is something come and go, like i am testing on my server now, it works and same result came back , but i note something

when service returned multiple labels, i mean the error scenario, when i press enter it take like 3sec to return the output, yes the service is online. Do think its some kind of ddos. Not sure..

But believe me its happening....

Now when i hit enter, output return within less than sec

beshoo commented 6 years ago

I tried to bunchmark the service vi ab but it returns the correct result.

I am not sure

loretoparisi commented 6 years ago

I'm closing this issue, feel free to reopen it if you have additional questions or further issues.