Closed solresol closed 8 months ago
None
)[!TIP] I can email you next time I complete a pull request if you set up your email here!
The sandbox appears to be unavailable or down.
I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.
language_plane_finder.py
✓ https://github.com/solresol/wordplanes/commit/c99fad6257a7c8a32c7b441d2fec44e8ff68f940 Edit
Create language_plane_finder.py with contents:
• Start by importing the necessary modules and classes. This includes `argparse` for command-line argument parsing, `numpy` for numerical operations, `random` for subsampling, `sqlite3` for database operations, and the `GloveLookup` class from `wordnet_vocab.py` and the `Plane` class from `plane.py`.
• Implement command-line argument parsing using `argparse.ArgumentParser`. The script should take six arguments: `--sqlite-database`, `--table`, `--dimensionality`, `--subsample`, `--random-seed`, and `--part-of-speech`. The `--subsample` and `--part-of-speech` arguments should have default values of `None`.
• Create a `GloveLookup` object using the `--sqlite-database`, `--table`, and `--dimensionality` arguments.
• Call the `yield_word_senses` function from `wordnet_vocab.py` with the `--part-of-speech` argument to generate a list of word senses. If the `--subsample` argument is not `None`, randomly subsample the list to the specified size using the `random.sample` function with the `--random-seed` argument as the seed.
• Implement three nested `for` loops to iterate over the vocabulary list. For each combination of three words where word1 < word2 < word3, use the `GloveLookup` object to get the numpy arrays for each word and create a `Plane` object.
• For each other word in the vocabulary, use the `GloveLookup` object to get the numpy array for the word and call the `Plane` object's `distance_to_plane` method. Store the distances in a list or numpy array.
• Inside the triple `for` loop, calculate the summary statistics for the list of distances using `numpy` functions: minimum, 0.1th percentile, 1th percentile, 25th percentile, median, mean, 75th percentile, 99th percentile, 99.9th percentile, max, standard deviation. Print out word1, word2, word3, and the summary statistics.
I have finished reviewing the code for completeness. I did not find errors for sweep/create_language_plane_finderpy
.
💡 To recreate the pull request edit the issue title or description. To tweak the pull request, leave a comment on the pull request. Join Our Discord
Details
language_plane_finder.py
takes six CLI arguments:First it creates a
wordnet_vocab.GloveLookup
object using the first three CLI arguments.Then it calls
wordnet_vocab.yield_word_senses
with the part of speech argument. This returns a generator, it creates a list using of [subsample] elements (chosen randomly, seeded by [random seed].It then has three nested
for
loops iterating over that list.i.e.
It makes sure that word1 < word2 < word3. It uses the GloveLookup object to get the numpy arrays (point in embedding space) for each of these words, and creates a
plane.Plane
object constructed from point1, point2 and point3.It iterates over all the other vocabulary (i.e. any word that's not word1, word2 or word3), gets that word turned into a point and calls the
plane.Plane
object'sdistance_to_plane
method. Store it in a list or numpy arrayCalculate the summary statistics for those lists (inside the triple
for
loop): minimum, 0.1th percentile, 1th percentile, 25th percentile, median, mean, 75th percentile, 99th percentile, 99.9th percentile, max, standard deviation. Print out word1, word2, word3 and the summary statistics.Checklist
- [X] Create `language_plane_finder.py` ✓ https://github.com/solresol/wordplanes/commit/c99fad6257a7c8a32c7b441d2fec44e8ff68f940 [Edit](https://github.com/solresol/wordplanes/edit/sweep/create_language_plane_finderpy/language_plane_finder.py)