solresol / wordplanes

Are many of our word vectors actually expressible as planes in 2D space?
1 stars 0 forks source link

Sweep: Create language_plane_finder.py #66

Closed solresol closed 8 months ago

solresol commented 9 months ago

Details

language_plane_finder.py takes six CLI arguments:

First it creates a wordnet_vocab.GloveLookup object using the first three CLI arguments.

Then it calls wordnet_vocab.yield_word_senses with the part of speech argument. This returns a generator, it creates a list using of [subsample] elements (chosen randomly, seeded by [random seed].

It then has three nested for loops iterating over that list.

i.e.

for word1 in vocab:
  for word2 in vocab:
    for word3 in vocab:

It makes sure that word1 < word2 < word3. It uses the GloveLookup object to get the numpy arrays (point in embedding space) for each of these words, and creates a plane.Plane object constructed from point1, point2 and point3.

It iterates over all the other vocabulary (i.e. any word that's not word1, word2 or word3), gets that word turned into a point and calls the plane.Plane object's distance_to_plane method. Store it in a list or numpy array

Calculate the summary statistics for those lists (inside the triple for loop): minimum, 0.1th percentile, 1th percentile, 25th percentile, median, mean, 75th percentile, 99th percentile, 99.9th percentile, max, standard deviation. Print out word1, word2, word3 and the summary statistics.

Checklist - [X] Create `language_plane_finder.py` ✓ https://github.com/solresol/wordplanes/commit/c99fad6257a7c8a32c7b441d2fec44e8ff68f940 [Edit](https://github.com/solresol/wordplanes/edit/sweep/create_language_plane_finderpy/language_plane_finder.py)
ifost-autodev[bot] commented 9 months ago

🚀 Here's the PR! #67

See Sweep's progress at the progress dashboard!
💎 Sweep Pro: I'm using GPT-4. You have unlimited GPT-4 tickets. (tracking ID: None)

[!TIP] I can email you next time I complete a pull request if you set up your email here!


Actions (click)

GitHub Actions failed

The sandbox appears to be unavailable or down.


Step 1: 🔎 Searching

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I think are relevant in decreasing order of relevance (click to expand). If some file is missing from here, you can mention the path in the ticket description. https://github.com/solresol/wordplanes/blob/55eda2249731fe8b1a4fce01e3d00a7a983453a3/wordnet_vocab.py#L3-L44 https://github.com/solresol/wordplanes/blob/55eda2249731fe8b1a4fce01e3d00a7a983453a3/plane.py#L3-L14 https://github.com/solresol/wordplanes/blob/55eda2249731fe8b1a4fce01e3d00a7a983453a3/embeddings2sqlite.py#L4-L33

Step 2: ⌨️ Coding


Step 3: 🔁 Code Review

I have finished reviewing the code for completeness. I did not find errors for sweep/create_language_plane_finderpy.


🎉 Latest improvements to Sweep:


💡 To recreate the pull request edit the issue title or description. To tweak the pull request, leave a comment on the pull request. Join Our Discord