steadyfish / ogdindiar

R package to access data from Open Government Data Platform - India
Other
6 stars 3 forks source link

API or package problem? #15

Open maelle opened 8 years ago

maelle commented 8 years ago

Hi,

I tried to download these data from the website https://data.gov.in/catalog/indian-railways-train-time-table-0

and got this error fetch_data("b46200c1-ca9a-4bbe-92f8-b5039cc25a12")

Error in function (type, msg, asError = TRUE) : Unknown SSL protocol error in connection to data.gov.in:443

Do you get the same error for this dataset?

steadyfish commented 8 years ago

Hi @masalmon,

Did you get this error while trying to download any other data?

This indian railways dataset seems quite big and the way data.gov.in API is set up, it allows to fetch only a 100 records per API call. The fetch_data() function was made to make multiple API calls to download the entire dataset. To avoid this, I have added a max_obs parameter (defaulted to 500) to fetch_data() function. This will limit the number of API calls being made (500 / 100 = 5 calls, in this case). This perhaps could resolve the error you are getting.

Could you try again after re-installing this package?

maelle commented 8 years ago

Thanks for being so reactive! :thumbsup:

100 is a very small limit. :disappointed: In the API of the OpenAQ platform for which I've written a R package I've been luckier than you: the limit is 1000, they do paging and you can get the total number of measurements so you know how many calls you need to make. In your case, you have to do it "blindly" because the API doesn't return you the number of lines in the original file, what a pity!

No I didn't get the error with other datasets I had tried. They were much smaller.

I have installed the new version and I got this error

lala <- fetch_data("b46200c1-ca9a-4bbe-92f8-b5039cc25a12", max_obs=70000)
Error in function (type, msg, asError = TRUE)  : 
  Unknown SSL protocol error in connection to data.gov.in:443

Then I did it a second time and got a new error

lala <- fetch_data("b46200c1-ca9a-4bbe-92f8-b5039cc25a12", max_obs=70000)
Error in function (type, msg, asError = TRUE)  : 
  SSL read: error:00000000:lib(0):func(0):reason(0), errno 10053

I tried with a limit closer to the number of lines in the timetable (69007)

lala <- fetch_data("b46200c1-ca9a-4bbe-92f8-b5039cc25a12", max_obs=69010)
Error in function (type, msg, asError = TRUE)  : 
  Failed to connect to data.gov.in port 443: Timed out

Is the data too big? It's quite a limitation of the API, ah! But I guess I could still use it if used the filter argument and queried over things that interest me (like all trains from Hyderabad). In this case, I really wanted the whole thing.

I have two suggestions (because some tables will be bigger than 100 or 500 lines without being as bing as the train timetable :smile: ):

Thanks again for your help and your package!

steadyfish commented 8 years ago

Sure @masalmon, I'll incorporate your suggestions. :)

maelle commented 8 years ago

Cool, thank you!

I was also thinking that your package needs use cases. The open data platform is a goldmine! Maybe in the next weeks I'll do something with the trains (e.g. querying all trains from a city and making a map of all trains). I'm sure it would motivate people to use the package and the data. :smile: And then you could add cool pictures/gif from the data in the README for teasing. :laughing:

steadyfish commented 8 years ago

Sounds good, Thanks! :)