Open datatalking opened 3 years ago
Good questions. :smile:
For points 1 and 3:
We haven't imposed any limits on # of calls each day, and there isn't any cost (from us) for transfer. From our internal stats, we're not really getting anywhere near usage that would mean we have to introduce limits nor costs. If/when that starts happening, then we'll look into the best way of handling it.
In the meantime, we're upgrading the backend servers anyway to be much more powerful, so that'll push out any potential need for call limits further.
With point 2, it's currently hard coded to 512MB. Uploading a 512MB database can take an awful long time, but the option is there if people want to for some reason. On that note, if someone wants a larger database uploaded, the new servers will be able to handle it (prob up to about 10GB). The current backend servers though would struggle, so we'll leave it as 512MB for now. :wink:
For point 4. To be a contributor you're probably best off to start getting the hang of documenting stuff using the wiki (it's public access, even for writing). We haven't really put enough time and effort into documenting stuff, so it's likely a case of making an initial start, learning from that, an iterating on it.
Does that make sense? :smile:
That aside, where does your general interest in this stuff lay and what skills are you already strong on? Personally, I've found that documenting stuff works pretty well if it ties into an area of interest. Whereas trying to force myself to document stuff that's knid of boring... keeps on getting put off. :wink:
Most of the areas that I do know of deVops or software or from the algorithm or statistical side of a process.
I spent 12 years in mechanical engineering making working drawings of everything from concrete formwork for pouring foundation, bridges to precision machining to jigs for military aircraft parts.
After 911 I switched to finance since I've been told over analyze everything and loved all of the gathering of data. Spent 7+ years doing Montecarlo multi chain and hidden Markov method analysis for risk tolerance and growth projections. I've been slowly building a algorithmic trading analytical tool and run a stealth start up.
Somewhere in the documentation, automation arena. I am currently wrapping up in undergrad in data analytics and I use a ton of python pandas and SQL in ingesting, cleaning, sorting and organizing files.
I'm good at zooming out to see the macro issues and then zooming in to follow each micro process from start to finish over and over.
Curious to know if the database limit is still 512MB or was it bumped to something higher?
@captn3m0 It's still 512MB by default, but can be switched off for named users. eg admin staff so far at the moment.
What did you have in mind?
Btw - check your email, if you haven't recently. Emailed you a few times last night about stuff (eg the SQLite zipfile module), but not sure if you're getting them. :smile:
What did you have in mind?
I maintain a dataset of Indian Mutual Funds with historical pricing information that goes back to 2006. It's ~250MB compressed (zsdt), but inflates to ~935MB, and grows to 2GB after index addition. It grows by roughly 50-100MB a year or so.
Would be nice to have it on DBHub, for tracking changes over time more easily.
Thanks for your mails, replied there already!
No worries, that sounds workable. How often does it get updated?
Once a day.
Ahhh. At the moment our backend still stores every snapshot as a complete, independent SQLite database file. So that's really more like 2GB * 365 days (per year), until we get around to changing the backend storage to only do differences in some way.
If you do it as a "Live" database however (no historical snapshots though), then the on-disk size would only be that latest size. That'd be way more workable for us in the short/medium term.
Thoughts?
I guess on the plus side of DBHub.io vs the flatgithub.com approach, is performance. We're using decently specced servers and reasaonble (heh) Go code, so working with the data via the web interface is fairly workable.
On the negative side though, we haven't (yet) hooked up the column filtering part of our data grid layout, so it's not yet possible to type in a search term to filter stuff.
That's likely not a big task in itself, and shouldn't be too far off. Still probably a few weeks away, unless @MKleusberg wants to prioritize it sooner (?). :wink:
Live database would work. The boring changes (pricing data) are versioned inside the database, so they can be tracked with queries. The other changes I want to track (Metadata changes, such as names, or IDs) - I can track elsewhere for now.
All good. I'll white list you on DBHub.io now.
Did you want the captn3m0
username, or the nemo
one (if that gets re-created properly), or both, etc? :smile:
Both would be nice.
No worries, will do. :smile:
k, I've added your captn3m0
username to the whitelist. The nemo
user can be done once the user is in our system. eg try "signing up" with nemo
again, and see if Auth0 likes it this time
I've just learned about your project and perhaps this is discussed elsewhere but after reviewing the docs I didn't see where it showed what the limits of the API are.
Questions: