piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.55k stars 4.37k forks source link

Gensim's FastText model reads in unsupported modes from Facebook's FastText #3179

Open mpenkov opened 3 years ago

mpenkov commented 3 years ago

In gensim/models/fasttext.py:

    model = FastText(
        vector_size=m.dim,
        vector_size=m.dim,
        window=m.ws,
        window=m.ws,
        epochs=m.epoch,
        epochs=m.epoch,
        negative=m.neg,
        negative=m.neg,
        # FIXME: these next 2 lines read in unsupported FB FT modes (loss=3 softmax or loss=4 onevsall,
        # or model=3 supervised) possibly creating inconsistent gensim model likely to fail later. Displaying
        # clear error/warning with explanatory message would be far better - even if there might be some reason
        # to continue with the load - such as providing read-only access to word-vectors trained those ways. (See:
        # https://github.com/facebookresearch/fastText/blob/2cc7f54ac034ae320a9af784b8145c50cc68965c/src/args.h#L19
        # for FB FT mode definitions.)
        hs=int(m.loss == 1),
        hs=int(m.loss == 1),
        sg=int(m.model == 2),
        sg=int(m.model == 2),
        bucket=m.bucket,
        bucket=m.bucket,
        min_count=m.min_count,
        min_count=m.min_count,
        sample=m.t,
        sample=m.t,
        min_n=m.minn,
        min_n=m.minn,
        max_n=m.maxn,
        max_n=m.maxn,
    )
amin110314 commented 3 years ago

may I work on this?

mpenkov commented 3 years ago

Sure.

kitrak-rev commented 3 years ago

Since there is no commits for a month, can I take up this task? (I am just starting with open source contribution, any references or redirection is welcomed)

Tewatia5355 commented 3 years ago

3222 I have tried to add error handling before loading another fasttext model to gensim/fasttext.

Please review and tell me will it work? image

karan121bhukar commented 3 years ago

@mpenkov this is my 1st contribution to this project, can you take a look at PR #3223 and let me know if it's good to go?

ryandelano commented 9 months ago

Hi, is this issue still available to be worked on? Would love to get started with contributions on this project!

NexusSRC commented 1 month ago

would love to start contributing to this repo by starting with this issue, can it be assigned to me?

gojomo commented 1 month ago

You don't need to be assigned something to propose a fix, in the form of a Github PR.

This is a very low-stakes matter - less-used (fairly obscure) FastText modes, with few people expecting them to work in Gensim. It's unclear that the gap here has ever actually inconvenienced or confused anyone – it's just a matter of being clear/thorough in describing what actually works, "just in case".

So improvements should stay safe/minimal/easy-to-review/low-risk - for example, it might just be some extra clarity in related doc-comments (to note not all modes fully supported). Maybe, also, some warning sent in user-visible output (with same conventions as similar warnings elsewhere in Gensim) if/when unsuppored modes are loaded - if it can be done simply.

NexusSRC commented 1 month ago

Okay, will look forward for more impactful problems to solve then

gojomo commented 1 month ago

This would still be a fine first issue, for which improvements wquld be welcome! My comment was meant to help describe the kind of (careful, minimal, straightforward) contribution that would make sense for this sort of "loose ends" issue.

NexusSRC commented 1 week ago

Sure makes sense