rmenegaux / fastDNA

Other
23 stars 13 forks source link

Segmentation fault with hs and ns loss functions #3

Closed chrisLanderson closed 4 years ago

chrisLanderson commented 4 years ago

Hello,

I noticed that whenever I tried to use the hs or ns loss functions I end up with a segmentation fault right after the data is read in and before training begins (the training progress bar never appears). I am able to produce this error on the test data provided in fastDNA/test/train. I tried compiling on a few different systems and with different versions of the gcc compiler but keep reproducing the error. I don't get this same error with the hs or ns loss functions when I just run fasttext on its own, albeit with different data. Any idea as to what is causing this issue?

Thanks, Chris

rmenegaux commented 4 years ago

Hi Chris,

fastDNA only supports supervised learning for now, so I'm not too surprised it doesn't work with ns or hs yet. Shouldn't be too much of a change though I will get it done by the end of the week.

Romain

chrisLanderson commented 4 years ago

Hi Romain,

If that is something you are interested in implementing, that would be great, but no pressure. I think the hierarchical softmax could speed training when dealing with a lot of different classes.

Thanks for your hard work on this project, I think it's quite valuable.

Thanks, Chris

rmenegaux commented 4 years ago

I've pushed a feature for the hierarchical softmax loss, so you should be able to do it if you pull from master. Note that the label counts I use is the total length (in base pairs) of the training sequences for each label, which correspond to the frequencies at which they are seen during training.

I won't add the ns loss right now, for which I would need to keep count of the k-mers in the training set. We'll keep this for a later feature perhaps :)

chrisLanderson commented 4 years ago

Thanks a lot for your work implementing this, really appreciate it. It appears to be working for me, so I'll close the issue.

Chris