pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
84.73k stars 22.81k forks source link

Feature Request: NegativeSampling and HierarchicalSoftmax loss functions #634

Open hardikp opened 7 years ago

hardikp commented 7 years ago

It's a bit difficult to write a SkipGram word2vec model without these functions.

Not entirely sure, but the Chainer implementations for NegativeSampling and HierarchicalSoftmax could be easily ported to pytorch.

I think this could be a useful addition to pytorch. Also, if you need some help in this, I might be able to find some time and submit a PR.

cc @albanD @mruberry @jbschlosser @walterddr @kshitij12345 @saketh-are @ezyang @ngimel

apaszke commented 7 years ago

Yeah, they seem useful. PRs welcome! 🙂

prcastro commented 7 years ago

I'm having brutal performance issues training a W2V model without NegativeSampling. Would be very nice to have this feature in pytorch 🔥 .

bkj commented 7 years ago

+1

amnezzia commented 7 years ago

@soumith @apaszke I think I have a solution, not sure it is the best, but I can create a PR I am aware of https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md is there something else I need to know? Also I assume I should add that to this module https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/loss.py, is that correct?

isaykatsman commented 6 years ago

@amnezzia Any updates? I think I'd like to work on this soon.

ezyang commented 5 years ago

@pietern: fairseq has their own custom implementation of HierarchicalSoftmax. The question is if this should be in core, or in domain libraries. (Probably domain libraries at least; specifically torchtext.) HierarchicalSoftmax is most applicable to problems where you have a lot of classes.

ezyang commented 4 years ago

I looked in fairseq and couldn't find any occurrence of HierarchicalSoftmax in their library.

gchanan commented 4 years ago

we accept the feature request for HierarchicalSoftmax.

we'll accept the negative sampling loss (i.e. what is linked in chainer), but don't plan to add anything beyond that at this time.

cooljackal commented 2 years ago

Are there any example scripts to test potential implementations of NegativeSampling and HierarchicalSoftmax?

cooljackal commented 2 years ago

I started working on a feature branch of a fork to see if I can add a version of HierarchicalSoftmax. I am adding it as HSoftmax for brevity and following the caffe2 convention. I am thinking of using a Huffman tree built using the priority queue method.

cooljackal commented 2 years ago

Started working on PR #82207 to add HierarchicalSoftmax functionality.