Open stuartlynn opened 8 years ago
I love this idea!
We're doing a lot of ETL work right now in https://github.com/cartodb/bigmetadata, and one of the outputs we support are tables on Carto.
I'd love to do a one-off PoC of hosting on Carto, mainly thinking about the downstream side (getting the data from Carto.)
If that looks good, we could think about handling the upstream side as part of our open source ETL, which would mean easier reproduceability and execution.
This would be really cool, and definitely make it easier to show functionality. I know I run to south.dbf
out of familiarity... And I don't even know what many of the columns mean because there's no metadata on it.
Most of the time I spent writing cenpy was in figuring out how to automatically scrape and format Metadata about census products. Once that's written, it should just be writing a simple query builder for the public sql api right? I had designs of wrapping up common queries with cenpy and adding to pysal, but just haven't had the time. If there's interest in building an examples.fetch
function around carto, I'm game to contribute/review.
Should the internals be maintained by pysal, though? Maybe a separate, conditional dependency makes more sense, like cartodb-python? I may, in drier times, be wary of the long run maintenance costs, with how dynamic y'all systems are :grinning:
I mean, is it as simple as pointing pd.read_table
to fixed targets?
Would be another nice way to collaborate between the two projects.
I agree with @ljwolf that a conditional dependency approach makes the most sense as an initial exploration. We could think about a couple of options, 1) using the implementation to have pysal extend its example datasets to include a carto data set, if the conditional import is there. 2) we collaborate on building a new dataset that lives at carto and use that to test out an implementation.
There are probably other routes to explore as well, but these are some initial thoughts.
I like the idea of a conditional dependency -- I think this code would be better maintained outside of mainline pysal.
How's this for a roadmap:
A PR to interop with Carto Observatory would be fantastic.
Opening this ticket to explore an idea that @ljwolf and myself had chatted briefly about. For the example datasets that are used in pysal, could these be maintained externally and just pulled by the library when required and cached locally? It's really easy to pull a Carto table directly in to a pandas dataframe using our SQL API so it might be a natural fit to store some of those data sources in Carto?
This would be similar to the approach scikit takes with grabbing example datasets.