Closed wdoerner closed 10 years ago
Great list, Will! In order:
.txt
(not sure about .xls
—CSV saved with that extension is really just a bug).http://www.agencyname.gov/csv-to-api/
works (that is, it's at a valid URL, you don't get a 404), then you should be able to pass a URL like http://www.agencyname.gov/csv-to-api/?source=http://www.agencyname.gov/path/to/file.csv
. Note that source
does need to be a complete URL, so that the program will know where to find it. I'll figure out how to make the documentation more clear on this point.I'll expand this into a series of issues now, to handle them each individually.
The export filename doesn't have an extension.
Wait. I just thought this through. And now I'm confused. An API doesn't provide filenames, it just returns JSON. There's no file being transmitted, only data as a part of an HTTP response. I think everything is working properly here, you're just doing something with it that isn't normally done with APIs. :)
Thanks for the quick replies.
For a 6th request, I know it's bad practice with the csv-to-api format to not have the first row be variable names but, well, it happens. The problem is that the first row is automatically assigned to be the variable name keys. Could an argument be added like "novarnames=true" (default being false so it is optional)? When it's true, a quick foreach loop could assign names like var1-varx or col1-colx if the array is of size x. That would still allow to sort and filter over column names for files that lack variable names (using var1, var2, etc.). The obvious fix is to have standardized files but files are sometimes posted without variable names and this would give the end user more control.
Thanks again for the help!
I have a lucky number 7th request and then I'll let things rest as I try out more stuff. I don't want to abuse the help and time of others on here but think this would be a useful addition when a CSV has many (unwanted) fields.
Can a "keep" or "drop" function be added? I think the PHP term is "array_splice". It would need to support multiple variables like "keep=[var1,var2,var3]" or "drop=[var4,var5]". I write the code jointly because if it were "keep=var1&keep=var2" I am afraid that the first command might be executed and leave us without var2 (or vice versa).
http://stackoverflow.com/questions/16564650/best-way-to-delete-column-from-multidimensional-array http://stackoverflow.com/questions/3702573/remove-columns-from-the-subarrays-of-a-two-dimensional-array
Every time I make that GET request with http://wwww.agencyname.gov/csv-to-api/..., it doesn't recognize the API and just downloads the source file.
Interesting. I suspect that this is two problems in play, but I don't know for sure. I speculate that your web server isn't figured to recognize JSON as a file type, and is just treating it as generic binary data, delivering it to the browser to figure out. And then your browser, I further speculate, has no capacity to figure out that it's JSON (a reasonable guess for IE 8), and it also just treats it as generic binary data. Do you happen to know what web server and version of that web server that your agency's website is running? That would help to get to the bottom of this. We might be able to force the use of some more descriptive headers or something, to work around this problem.
I know it's bad practice with the csv-to-api format to not have the first row be variable names but, well, it happens. The problem is that the first row is automatically assigned to be the variable name keys.
Hmm. Interesting problem. Your solution seems wholly reasonable. I've created #20 to address this.
I don't want to abuse the help and time of others on here
The great thing about the process here on GitHub is that there's no such thing as that. :) That is, any ideas for modifications that you or anybody else have can be posted here, and when somebody gets to them, great! It's my experience that it's best to make such suggestions for features or fixing bugs as soon as one encounters them—otherwise, we have a way of forgetting the change that we'd envisioned, or becoming accustomed to the bugs. So keep it up! :)
Can a "keep" or "drop" function be added?
Could you describe the functionality to me? Is the idea that keep=name
would drop every column except name
? And that drop=address
would drop the address
column, but keep all of the others? Do I have that right?
Thanks for replying to the other comments and creating the other issues to address them.
Do you happen to know what web server and version of that web server...?
Unfortunately, no. The GET works fine in Firefox and Chrome when I use the labs.data.gov in the URL. It appears that IE 8.0 is the problem here because it wants to download an XML file or displays the JSON in a single line. I think this error is with either the earlier version of the browser or the default behavior for a file type. Either way, I cannot fix that problem because of IT restrictions.
The other issue that I think I can address is when I replace the labs.data.gov part with www.agencyname.gov and it downloads the source
file instead of executing the format change, filter, or sort. I believe I might be able to create a folder on the server (again, don't have permissions so I have to wait on it) because an alias did not work. The last option is a redirect that I will try out. If none of those work, then I will add another comment in here. I feel like it is at least useful to write down these steps in case others see this post later and to organize myself on the approach.
Could you describe the functionality to me? Is the idea that keep=name would drop every column except name? And that drop=address would drop the address column, but keep all of the others? Do I have that right?
Correct. The idea is to reduce the array size or only show certain columns/fields. In statistical software I am used to, the keep
and drop
commands are the common syntax that have the same goal but approach the task differently. A series of drop
commands would do the same thing as a single keep
.
There are times when you might have many columns and keep=name
would avoid displaying all of them. Likewise, drop=address
could tighten up the display of a dataset where address was not desired (if there are many fields, it's just quicker to do a drop
than to type in all the fields for a keep
). A flexible function would be one that reads in multiple elements (maybe enclosing with square brackets or quotes) if a user wants to keep
or drop
more than a single field, like keep=[address,city,state]
. Another approach would be a joint command like keep=address&keep=city
but that performs them both at the same time instead of executing either keep=address
or keep=city
(I've found that two sort
commands would execute only one or the other but not both so I make the distinction here for clarity).
I think this error is with either the earlier version of the browser or the default behavior for a file type.
Interesting! Well, the good news is that it probably doesn't matter. By which I mean that the purpose of CSV to API is for software to be able to access data via APIs, rather than web browsers. :) So while this is surely frustrating when trying to debug the data output in IE 8, it looks like everything's working OK (except IE 8, of course!). Still, I'm going to check on the output, to see if we can force a more specific header that might allow this to work better.
Re: keep
and drop
, that makes sense to me. I've just opened #21 to that end. Thanks for the explanation!
Great, simple API! I've been trying it out and noticed a few things that could be improved (I'm not a programmer so I can only contribute on the user end). My top 5 things that I hope somebody can tweak or address:
1) There is a file size limit. I know it won't work with a source file with over 90,000 rows. Not sure about less. 2) The sort is limited to a single variable. Can we add two or more? A good example of use is sorting over year and month, which isn't possible and can't be corrected by an end user. 3) The export filename doesn't have an extension. One of the computers I use is set to download certain files automatically and it downloads csv-to-api without a default .json or anything else (if the format variable is different). 4) The source file is restricted to a CSV. Can we do a TXT also that is in CSV format? I have some files that are saved identical to a CSV but have a TXT or XLS extension and cannot be read in. 5) Is there a way to query as http://www.agencyname.gov/csv-to-api?source=file.csv? Right now, the first part starts with http://labs.data.gov. This might be my lack of understanding but I copied the csv-to-api.php file over to our server and it doesn't work with agencyname.gov. There isn't any mention of this on the "Source Code and Documentation" link and it would be helpful to understand for folks who are starting out.
Thanks! ~ Will