Open oriolgalceran opened 2 weeks ago
Hi, that's just the way the thesaurus is I guess. there's no relational metadata like antonym or hypernym, etc afaik nor is there any real maintenance or ownership of the work these days it seems. All I could find was the file containing a list of adjacency lists in textual form (words.txt, ~24M) which is what the the runtime files are built on.
You might want to check wordnet - it's part of the same umbrella public domain umbrella as the Gutenberg Project (or visa versa) and contains that kind of metadata you're interested in. It doesn't have every term in Moby but more than I expected last I checked and might help you here.
The alternative would be in the machine learning space and would be a really heavy lift from the get go unless you have some familiarity with it. I think wordnet is your best bet -Adriaan
On Sun, Nov 10, 2024 at 7:01 AM oriolgalceran @.***> wrote:
Why is the thesaurus full of antonyms (good-bad, cold-warm)? I understand this comes from the original dataset, but I'm wondering if there is any explanation (or any way to separate them!)
Thanks!
— Reply to this email directly, view it on GitHub https://github.com/words/moby/issues/21, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVFKR6K7CZ7M7CURCTNOWPLZ75YMDAVCNFSM6AAAAABRQJL3UWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGY2DOMZSGUYTSNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks! Just so you see what the source of my issue is, i built this game www.synonuity.com and I'm having real trouble finding a reliable (and open) synonym dataset in English. I looked at wordnet way back when I built it and I decided against using it, can't really remember why. I'm going to have a look at it again. Thanks for your reply!
I have a further question: in the entry for "computer" on the site https://moby-thesaurus.org/computer "machine" is included as a synonym. However, when I go on words.txt it's not there... why is that? I've seen this on other words too
But "machine" is included in words.txt in the form of multi-word phrases like "IBM machine' and "information machine" etc which are in the synset for "computer" Don't forget case is important as well, that can get you sometime.
Definitely wordnet is your best shot, Sourcing words from Moby seems like the minimum effort way to get you going. I'd be interested in your results.
On Sun, Nov 10, 2024 at 10:28 AM oriolgalceran @.***> wrote:
I have a further question: in the entry for "computer" on the site https://moby-thesaurus.org/computer "machine" is included as a synonym. However, when I go on words.txt it's not there... why is that? I've seen this on other words too
— Reply to this email directly, view it on GitHub https://github.com/words/moby/issues/21#issuecomment-2466837368, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVFKR6KAUDPNXQWP4XJPDCTZ76QU5AVCNFSM6AAAAABRQJL3UWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRWHAZTOMZWHA . You are receiving this because you commented.Message ID: @.***>
Why is the thesaurus full of antonyms (good-bad, cold-warm)? I understand this comes from the original dataset, but I'm wondering if there is any explanation (or any way to separate them!)
Thanks!