ru-stat / data-team-ru-stat

Team to tackle dataflow in Russian economic statistics (macro, corporate, banking)
0 stars 0 forks source link

Comments are open to Intro - part 1. Case for machine-readable data in economic analysis. #5

Closed epogrebnyak closed 7 years ago

epogrebnyak commented 7 years ago

In economic analysis theres is a range of compiting tasks from simple spreadsheet models (eg discounted cash flow to valuate a company) to econometric models (eg exchange rate forecasting), wholly or partially done in Excel.

Excel is deeply rooted into economic analysis, but this is changing. In Excel it is hard to make your graphs or models truely reproducible and transparent. In my mind economic analysis, say in a bank, will soon be 'robotised', as many other jobs. A new generation analyst will command a collection of apps, not Excel spreadsheets.

For non-Excel workflow you need the following (see also here):

Building a new workflow (important disclaimer: still not achieved for me) can go top-down (new tools, old data) or bottom up (new data source first). Somehow due to quailty of machine readable data sources I went the 'start from the bottom' way, to be covered in second part of introduction.

Any questions here:

May write in Russian, will summarise later.

Rotzke commented 7 years ago

Projects in which I was involved were much sophisticated, with two main flows:

I was thinking about the structure of project according to README and structure of Cookiecutter system, this tasks allocation model fits fine.

You didn't clearly tell what is the final desired form of the project @epogrebnyak but I assume that web application would be a great choice. We could make a backend part - raw data gathering and storing with our own scraping framework/array plus MongoDB for keeping an archive so that scientists like you could easily get access to it, plus frontend on, let's say, Django, which @MrBorusLee is proficient with, and Redux NoSQL for quick access to current and actual data. Also, REST API would be great for both team and clients convenience.

As a concrete foundation for this, I would propose Amazon AWS, as mentioned before - it has a whole year of a free tier to let us perform tests and adjustments before full product deployment.

Also, would be cool to place this issue on next online meeting agenda, for it needs concurrent discussion, I think.

Thanks for attention!

epogrebnyak commented 7 years ago

I think next post in introduction should really be about end-user specification and architecture proposal for open economic datasets, before everyone gets really bored about talk and no programming.

Just to give an insight the idea is simple:

Should I do some requirement writeup next?

Rotzke commented 7 years ago

@epogrebnyak don't forget about your devil - first talk and then programming :) You are right, Eugene, I think you could put this structure into next issue so people could propose their solutions for each section in comments and then use this as an agenda start on the meeting. Right?

epogrebnyak commented 7 years ago

I think it is good idea to schedule a live session to discuss requirements?

It involves two questions actually:

  1. end user requirement (no programming terms, just what the end user - an analyst or admin wants)
  2. architecture (solutions to requirement, basically, the building blocks)

(1) is often neglected, for good (the user is happy enough with whatever he gets) or for bad ("why exactly are we building this this way?"), so my role is to emphasise this first part and probably @Rotzke can organise discussion on the second part (design).

As for requirements, it would be great to analyse then against at least parts of this great checklist from Code Complete: http://www.matthewjmiller.net/files/cc2e_checklists.pdf, pages 5-6. Perhaps design checklist is useful as well.

Other links on requirements, quite long reads, but still:

On top of that I'd say prototyping is very important, lets build a small system first, klearn from it and then add to it.

Rotzke commented 7 years ago

It is a great idea @epogrebnyak :) Also: