piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.68k stars 4.38k forks source link

cleanup of vestigial `Morfessor`, `Varembed`, `gensim.models.wrapper(s)` references? #3342

Closed pabs3 closed 2 years ago

pabs3 commented 2 years ago

Problem description

I noticed that usage of Morfessor has mostly been removed, but there are some vestigial references to it in a few files. I would just submit a pull request for this, but I'm not sure if the files that reference Morfessor (especially as .travis.yml) should just get removed or kept and updated. Then I found that other things that reference Morfessor (such as Varembed.ipynb) also reference gensim.models.wrappers, which has also been removed. gensim.models.wrappers (and gensim.models.wrapper) are referenced mostly in the notebooks and comments, but also in the code of the TestDtmModel test, which uses DtmModel, which has also been removed but is also referenced in some places. Then I also found lots of uses of the word wrapper (mostly in the notebooks) in various places that seem to be referencing now removed code from gensim.models.wrappers.

Steps/code/corpus to reproduce

$ git grep -i morfessor -- :^CHANGELOG.md
.travis.yml:      - TEST_DEPENDS="pytest mock cython nmslib pyemd testfixtures Morfessor==2.0.2a4 python-levenshtein==0.12.0 visdom==0.1.8.9 scikit-learn"
docs/notebooks/Varembed.ipynb:      "\u001b[0;32m~/git/gensim/gensim/models/wrappers/varembed.py\u001b[0m in \u001b[0;36mload_varembed_format\u001b[0;34m(cls, vectors, morfessor_model)\u001b[0m\n\u001b[1;32m     58\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mvectors\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     59\u001b[0m             \u001b[0;32mraise\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Please provide vectors binary to load varembed model\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 60\u001b[0;31m         \u001b[0md\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mutils\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munpickle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvectors\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     61\u001b[0m         \u001b[0mword_to_ix\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0md\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'word_to_ix'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     62\u001b[0m         \u001b[0mmorpho_to_ix\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0md\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'morpho_to_ix'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
docs/notebooks/Varembed.ipynb:    "This loads a varembed model into Gensim. Also if you want to load with morphemes added into the varembed vectors, you just need to also provide the path to the trained morfessor model binary as an argument. This works as an optional parameter, if not provided, it would just load the varembed vectors without morphemes."
docs/notebooks/Varembed.ipynb:    "morfessor_file = '../../gensim/test/test_data/varembed_leecorpus_morfessor.bin'\n",
docs/notebooks/Varembed.ipynb:    "model_with_morphemes = varembed.VarEmbed.load_varembed_format(vectors=vector_file, morfessor_model=morfessor_file)"
Binary file gensim/test/test_data/varembed_morfessor.bin matches
$ git find | grep -i morfessor
./gensim/test/test_data/varembed_morfessor.bin
$ git grep -i varembed -- :^CHANGELOG.md
docs/notebooks/Varembed.ipynb:    "# VarEmbed Tutorial\n",
docs/notebooks/Varembed.ipynb:    "Varembed is a word embedding model incorporating morphological information, capturing shared sub-word features. Unlike previous work that constructs word embeddings directly from morphemes, varembed combines morphological and distributional information in a unified probabilistic framework. Varembed thus yields improvements on intrinsic word similarity evaluations. Check out the original paper, [arXiv:1608.01056](https://arxiv.org/abs/1608.01056) accepted in [EMNLP 2016](http://www.emnlp2016.net/accepted-papers.html).\n",
docs/notebooks/Varembed.ipynb:    "Varembed is now integrated into [Gensim](http://radimrehurek.com/gensim/) providing ability to load already trained varembed models into gensim with additional functionalities over word vectors already present in gensim.\n",
docs/notebooks/Varembed.ipynb:    "In this tutorial you will learn how to train, load and evaluate varembed model on your data.\n",
docs/notebooks/Varembed.ipynb:    "The authors provide their code to train a varembed model. Checkout the repository [MorphologicalPriorsForWordEmbeddings](https://github.com/rguthrie3/MorphologicalPriorsForWordEmbeddings) for to train a varembed model. You'll need to use that code if you want to train a model. \n",
docs/notebooks/Varembed.ipynb:    "# Load Varembed Model\n",
docs/notebooks/Varembed.ipynb:    "Now that you have an already trained varembed model, you can easily load the varembed word vectors directly into Gensim. <br>\n",
docs/notebooks/Varembed.ipynb:    "For that, you need to provide the path to the word vectors pickle file generated after you train the model and run the script to [package varembed embeddings](https://github.com/rguthrie3/MorphologicalPriorsForWordEmbeddings/blob/master/package_embeddings.py) provided in the [varembed source code repository](https://github.com/rguthrie3/MorphologicalPriorsForWordEmbeddings).\n",
docs/notebooks/Varembed.ipynb:    "We'll use a varembed model trained on [Lee Corpus](https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/test/test_data/lee.cor) as the vocabulary, which is already available in gensim.\n",
docs/notebooks/Varembed.ipynb:     "evalue": "[Errno 2] No such file or directory: '../../gensim/test/test_data/varembed_leecorpus_vectors.pkl'",
docs/notebooks/Varembed.ipynb:      "\u001b[0;32m<ipython-input-2-3653006df438>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0mvector_file\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'../../gensim/test/test_data/varembed_leecorpus_vectors.pkl'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mmodel\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mvarembed\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mVarEmbed\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload_varembed_format\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvectors\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mvector_file\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
docs/notebooks/Varembed.ipynb:      "\u001b[0;32m~/git/gensim/gensim/models/wrappers/varembed.py\u001b[0m in \u001b[0;36mload_varembed_format\u001b[0;34m(cls, vectors, morfessor_model)\u001b[0m\n\u001b[1;32m     58\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mvectors\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     59\u001b[0m             \u001b[0;32mraise\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Please provide vectors binary to load varembed model\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 60\u001b[0;31m         \u001b[0md\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mutils\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munpickle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvectors\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     61\u001b[0m         \u001b[0mword_to_ix\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0md\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'word_to_ix'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     62\u001b[0m         \u001b[0mmorpho_to_ix\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0md\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'morpho_to_ix'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
docs/notebooks/Varembed.ipynb:      "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: '../../gensim/test/test_data/varembed_leecorpus_vectors.pkl'"
docs/notebooks/Varembed.ipynb:    "from gensim.models.wrappers import varembed\n",
docs/notebooks/Varembed.ipynb:    "vector_file = '../../gensim/test/test_data/varembed_leecorpus_vectors.pkl'\n",
docs/notebooks/Varembed.ipynb:    "model = varembed.VarEmbed.load_varembed_format(vectors=vector_file)"
docs/notebooks/Varembed.ipynb:    "This loads a varembed model into Gensim. Also if you want to load with morphemes added into the varembed vectors, you just need to also provide the path to the trained morfessor model binary as an argument. This works as an optional parameter, if not provided, it would just load the varembed vectors without morphemes."
docs/notebooks/Varembed.ipynb:    "morfessor_file = '../../gensim/test/test_data/varembed_leecorpus_morfessor.bin'\n",
docs/notebooks/Varembed.ipynb:    "model_with_morphemes = varembed.VarEmbed.load_varembed_format(vectors=vector_file, morfessor_model=morfessor_file)"
docs/notebooks/Varembed.ipynb:    "This helps load trained varembed models into Gensim. Now you can use this for any of the Keyed Vector functionalities, like 'most_similar', 'similarity' and so on, already provided in gensim. \n"
docs/notebooks/Varembed.ipynb:    "In this tutorial, we learnt how to load already trained varembed models vectors into gensim and easily use and evaluate it. That's it!\n",
docs/notebooks/Varembed.ipynb:    "* [Varembed Source Code](https://github.com/rguthrie3/MorphologicalPriorsForWordEmbeddings)\n",
gensim/models/keyedvectors.py::class:`~gensim.models.wrappers.varembed.VarEmbed` etc), they can be represented by a standalone structure,
$ git find | grep -i varembed
./docs/notebooks/Varembed.ipynb
./gensim/test/test_data/varembed_lee_subcorpus.cor
./gensim/test/test_data/varembed_morfessor.bin
./gensim/test/test_data/varembed_vectors.pkl
$ git --no-pager grep -i gensim.models.wrapper -- :^CHANGELOG.md
docs/notebooks/FastText_Tutorial.ipynb:    "from gensim.models.wrappers.fasttext import FastText as FT_wrapper\n",
docs/notebooks/Varembed.ipynb:      "\u001b[0;32m~/git/gensim/gensim/models/wrappers/varembed.py\u001b[0m in \u001b[0;36mload_varembed_format\u001b[0;34m(cls, vectors, morfessor_model)\u001b[0m\n\u001b[1;32m     58\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mvectors\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     59\u001b[0m             \u001b[0;32mraise\u001b[0m \u001b[0mException\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Please provide vectors binary to load varembed model\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 60\u001b[0;31m         \u001b[0md\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mutils\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munpickle\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvectors\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     61\u001b[0m         \u001b[0mword_to_ix\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0md\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'word_to_ix'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     62\u001b[0m         \u001b[0mmorpho_to_ix\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0md\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'morpho_to_ix'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
docs/notebooks/Varembed.ipynb:    "from gensim.models.wrappers import varembed\n",
docs/notebooks/WordRank_wrapper_quickstart.ipynb:      "\u001b[0;32m~/git/gensim/gensim/models/wrappers/wordrank.py\u001b[0m in \u001b[0;36mtrain\u001b[0;34m(cls, wr_path, corpus_file, out_name, size, window, symmetric, min_count, max_vocab_size, sgd_num, lrate, period, iter, epsilon, dump_period, reg, alpha, beta, loss, memory, np, cleanup_files, sorted_vocab, ensemble)\u001b[0m\n\u001b[1;32m    177\u001b[0m             \u001b[0;32mwith\u001b[0m \u001b[0msmart_open\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0minput_fname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'rb'\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    178\u001b[0m                 \u001b[0;32mwith\u001b[0m \u001b[0msmart_open\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0moutput_fname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'wb'\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mw\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 179\u001b[0;31m                     \u001b[0mutils\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcheck_output\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mw\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0margs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcommand\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mstdin\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    180\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    181\u001b[0m         \u001b[0mlogger\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minfo\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Deleting frequencies from vocab file\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
docs/notebooks/WordRank_wrapper_quickstart.ipynb:    "from gensim.models.wrappers import Wordrank\n",
docs/notebooks/Wordrank_comparisons.ipynb:    "from gensim.models.wrappers import Wordrank\n",
docs/notebooks/dtm_example.ipynb:    "from gensim.models.wrappers.dtmmodel import DtmModel\n",
docs/notebooks/ldaseqmodel.ipynb:    "from gensim.models.wrappers.dtmmodel import DtmModel\n",
docs/notebooks/topic_coherence_tutorial.ipynb:    "from gensim.models.wrappers import LdaVowpalWabbit, LdaMallet\n",
docs/notebooks/topic_coherence_tutorial.ipynb:      "\u001b[0;32m~/git/gensim/gensim/models/wrappers/ldavowpalwabbit.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, vw_path, corpus, num_topics, id2word, chunksize, passes, alpha, eta, decay, offset, gamma_threshold, random_seed, cleanup_files, tmp_prefix)\u001b[0m\n\u001b[1;32m    214\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    215\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mcorpus\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 216\u001b[0;31m             \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcorpus\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    217\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    218\u001b[0m     \u001b[0;32mdef\u001b[0m \u001b[0mtrain\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcorpus\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
docs/notebooks/topic_coherence_tutorial.ipynb:      "\u001b[0;32m~/git/gensim/gensim/models/wrappers/ldavowpalwabbit.py\u001b[0m in \u001b[0;36mtrain\u001b[0;34m(self, corpus)\u001b[0m\n\u001b[1;32m    235\u001b[0m         \u001b[0mcmd\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_vw_train_command\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcorpus_size\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    236\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 237\u001b[0;31m         \u001b[0m_run_vw_command\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcmd\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    238\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    239\u001b[0m         \u001b[0;31m# ensure that future updates of this model use correct offset\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
docs/notebooks/topic_coherence_tutorial.ipynb:      "\u001b[0;32m~/git/gensim/gensim/models/wrappers/ldavowpalwabbit.py\u001b[0m in \u001b[0;36m_run_vw_command\u001b[0;34m(cmd)\u001b[0m\n\u001b[1;32m    849\u001b[0m     \u001b[0mlogger\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0minfo\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Running Vowpal Wabbit command: %s\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m' '\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mjoin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcmd\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    850\u001b[0m     proc = subprocess.Popen(cmd, stdout=subprocess.PIPE,\n\u001b[0;32m--> 851\u001b[0;31m                             stderr=subprocess.STDOUT)\n\u001b[0m\u001b[1;32m    852\u001b[0m     \u001b[0moutput\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mproc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcommunicate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdecode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'utf-8'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    853\u001b[0m     \u001b[0mlogger\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdebug\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Vowpal Wabbit output: %s\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0moutput\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
gensim/models/callbacks.py:                  of its wrappers, such as :class:`~gensim.models.wrappers.ldamallet.LdaMallet` or
gensim/models/callbacks.py:                  :class:`~gensim.models.wrappers.ldavowpalwabbit.LdaVowpalWabbit`.
gensim/models/callbacks.py:            :class:`~gensim.models.wrappers.ldamallet.LdaMallet` or
gensim/models/callbacks.py:            :class:`~gensim.models.wrapper.ldavowpalwabbit.LdaVowpalWabbit`.
gensim/models/coherencemodel.py:            :class:`~gensim.models.ldamulticore.LdaMulticore`, :class:`~gensim.models.wrappers.ldamallet.LdaMallet` and
gensim/models/coherencemodel.py:            :class:`~gensim.models.wrappers.ldavowpalwabbit.LdaVowpalWabbit`.
gensim/models/keyedvectors.py::class:`~gensim.models.wrappers.varembed.VarEmbed` etc), they can be represented by a standalone structure,
Binary file gensim/test/test_data/fasttext_old matches
Binary file gensim/test/test_data/fasttext_old_sep matches
Binary file gensim/test/test_data/ft_model_2.3.0 matches
Binary file gensim/test/test_data/lee_fasttext matches
gensim/test/test_dtm.py:            model = gensim.models.wrappers.DtmModel(
gensim/test/test_dtm.py:            model = gensim.models.wrappers.DtmModel(
gensim/test/test_dtm.py:                gensim.models.wrappers.DtmModel(
gensim/utils.py:    Backported from Python 2.7 with a few minor modifications. Widely used for :mod:`gensim.models.wrappers`.
$ git grep -i DtmModel -- :^CHANGELOG.md
docs/notebooks/dtm_example.ipynb:    "from gensim.models.wrappers.dtmmodel import DtmModel\n",
docs/notebooks/dtm_example.ipynb:    "model = DtmModel(dtm_path, corpus, time_seq, num_topics=2,\n",
docs/notebooks/dtm_example.ipynb:    "To run it in this mode, we now call `DtmModel` again, but with the `model` parameter set as `fixed`. \n",
docs/notebooks/dtm_example.ipynb:    "model = DtmModel(dtm_path, corpus, time_seq, num_topics=2,\n",
docs/notebooks/ldaseqmodel.ipynb:      "\u001b[0;32m<ipython-input-15-353cbcbeba51>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      8\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      9\u001b[0m \u001b[0;31m# if we've saved before simply load the model\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 10\u001b[0;31m \u001b[0mdtm_model\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mDtmModel\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'dtm_news'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
docs/notebooks/ldaseqmodel.ipynb:    "from gensim.models.wrappers.dtmmodel import DtmModel\n",
docs/notebooks/ldaseqmodel.ipynb:    "# dtm_model = DtmModel(dtm_path, corpus, time_slice, num_topics=5, id2word=dictionary, initialize_lda=True)\n",
docs/notebooks/ldaseqmodel.ipynb:    "dtm_model = DtmModel.load('dtm_news')"
gensim/test/test_dtm.py:class TestDtmModel(unittest.TestCase):
gensim/test/test_dtm.py:            model = gensim.models.wrappers.DtmModel(
gensim/test/test_dtm.py:            model = gensim.models.wrappers.DtmModel(
gensim/test/test_dtm.py:                gensim.models.wrappers.DtmModel(
$ git --no-pager grep -i wrapper -- :^CHANGELOG.md
<lots>

Versions

This is from the gensim develop branch.

pabs3 commented 2 years ago

I also noted that tox.ini got deleted, but some things still reference tox.

piskvorky commented 2 years ago

That's right – tox is gone. We – and by we I mean really @mpenkov mostly – have been simplifying the CI & testing pipeline, as well as pruning dependencies.

So:

Either way, a further cleanup and de-referencing the purged modules and dependencies will be welcome!

pabs3 commented 2 years ago

Looks like the invalid tests in test_dtm.py are skipped because DTM_PATH is not set.

pabs3 commented 2 years ago

I take it you dropped Travis CI too?

pabs3 commented 2 years ago

Filed a new pull request to clean up the tox/Morfessor/wrapper references. I looked more closely and it seems that Travis CI is still used so I have left that in place. The notebooks stuff is probably best left to others to do slowly over time, so I didn't look at that.

https://github.com/RaRe-Technologies/gensim/pull/3345

-- bye, pabs

https://bonedaddy.net/pabs3/

piskvorky commented 2 years ago

Yes we migrated away from Travis although IIRC @mpenkov re-introduced it recently (and temporarily), on account of github actions not supporting the Apple ARM architecture properly yet. @mpenkov is that right?

mpenkov commented 2 years ago

Yes, that is mostly correct. That travis build was a contribution from another author.