pommedeterresautee / fastrtext

R wrapper for fastText
https://pommedeterresautee.github.io/fastrtext/
Other
101 stars 15 forks source link

loading time for supervised learning with pretrainedVector param #25

Open gabrielwong1991 opened 6 years ago

gabrielwong1991 commented 6 years ago

Hi, I found that in supervised learning, if I include params -pretrainedVector the load times for this command is painstakingly slow.

Without it it is very quick. My pretrainedVector is 1.5Gb in size. I am not sure if this is the case but does pretrainedVector read into R memory? I can check with my drive it only reads at around 400kb/s.

Thanks

pommedeterresautee commented 6 years ago

when loading external vectors, fasttext init a large matrix and store each word inside, that s why it may be slow. Usually, it helps only if your dataset is small.

86mm86 commented 5 years ago

Same problem. gabrielwong1991, could you please share what was the size of the dataset on which you trained the supervised learning model (using the pretrained vectors) and how long did it take? I have a training datatset of circa 23000 rows and I am using the english Wikipedia pretrained vectors (6Gb). After more than 12 hours it is still running, despite the message "Progress: 100.0% ..." has already popped up!

Thanks

pommedeterresautee commented 5 years ago

Does it work with the client version of Fasttext?

86mm86 commented 5 years ago

Do you mean using the original fastText C++ library with command prompt (I have a Windows company laptop)? I currently do not have it installed. In order to install fastText I would need to complete the installation of Microsoft Visual C++ 2017 which is asking for a system restart: this in turn would kill my job which is still running after 18 hours and I would prefer not to.

Do you think it is reasonable such a long waiting time (training seems to be completed, at least according to the Progress message), or I should just proceed to install the original fastText library and try it out?

pommedeterresautee commented 5 years ago

It s not ok to wait so much. I have tested the lib on windows on small data and it works but I have not made plenty of tests on windows. I would first try the command prompt version