margaret / python-datamuse

(Deprecated - please use https://github.com/gmarmstrong/python-datamuse) Python wrapper for the Datamuse API
MIT License
15 stars 2 forks source link

limited number of words #1

Closed pinakinathc closed 7 years ago

pinakinathc commented 7 years ago

In the main webpage, we can get a lot many words, i.e. more than 100 words related to a particular word but using this api we are getting very less number of words.

margaret commented 7 years ago

Which one of the query examples from docs is giving you that many results? I tried all the links under "What is it good for?" on the site docs, but none of them return more than 100 results for me. If you can provide an example query I will look into it.

pinakinathc commented 7 years ago

curl "https://api.datamuse.com/words?ml=ringing+in+the+ears&max=4" | python -mjson.tool using this code you can fetch more number of records. I developed a python script using

os.system('`curl "https://api.datamuse.com/words?ml=text+to+be+fetched&max=1000" | python -mjson.tool > output.json')

This will fetch the entire data and store it into a file in json format with can be later used by any code as:

import json
data = []
with open ('data_file.json') as f:
     data = json.load(f)
     f.closed

The most interesting thing about it is: if you use '1000' in max given above in the os.system() and the website has say 777 words, it will return 777 words only. Meaning you can chose the maximum number of words you want and fetch the data.

Thanks :)

margaret commented 7 years ago

I had another look at their docs — I missed this bit about the max parameter

max | Maximum number of results to return, not to exceed 1000. (default: 100)

That's why it was truncating it. I'm not sure why it doesn't truncate the response to curl queries, though 🤔.

I've added a method set_max_default that will allow you to set that to a higher number. If you always want to get max=1000 you can either call api.set_max_default(1000) before working, or instantiate the api like Datamuse(max_results=1000).

Thanks for bringing the issue to my attention