yulqen / bcompiler-engine

MOVED: A Python library to alleviate the pain of using Excel spreadsheets to collect data from your stakeholders.
https://git.sr.ht/~yulqen/bcompiler-engine
MIT License
4 stars 1 forks source link

Turn data keys into variables #19

Closed banillie closed 4 years ago

banillie commented 4 years ago

Is there a way for the dictionary keys created via the project_data_from_master or in the datamaps.api to be automatically stored as variables - thus saving the need to cut and pastes exact string names to return values.

So for examples, rather than having to write master.data['project name'] you could write master.data[project_name (as the variable which contains the 'project name' value]. And following this instead of having to write master.data['project name']['key name'] you could write master.data[project_name][key_name (as the variable containing the 'key name')].

The reason I'm asking is because it can be quite cumbersome having to find and make sure the exact project name and key name is used each time, and having them stored as variables would be much quicker.

I imagine that they would need to be imported into the code via one or two statements.

Maybe this is something that could go into the datamaps api?

If you see my analysis.data file https://github.com/banillie/projectlibrary/blob/master/analysis/data.py, at around line 90 on wards, I have hard coded in project names as variables. But this is clearly not an efficient way of working as it is hard coded and requires each variable to be imported individually.

Given that the master.data['project name']['keys'] run to around 1500 I don't think it can be used to solve having keys names as variables.

I've had a look at how this might be done, but not sure I've found much of use.

https://stackoverflow.com/questions/5036700/how-can-you-dynamically-create-variables-via-a-while-loop

https://stackoverflow.com/questions/19122345/to-convert-string-to-variable-name

https://stackoverflow.com/questions/5036700/how-can-you-dynamically-create-variables-via-a-while-loop

I've not sure if what I'm proposing here is something that goes against how python is structured, so a big no no. If there is a solution to this there would need to be some cleaning on the string names into variables so that they were all lower case, had _ between gaps, and symbols such as / removed, but that would be quite straight forward I guess.

yulqen commented 4 years ago

The datamaps API actually has data.projects which lists all the project titles in a master - but it's not very well documented (under Filtering Project Data at https://bcompiler.readthedocs.io/en/latest/api.html#). Does that help?

For example, in https://github.com/banillie/projectlibrary/blob/master/analysis/data.py you could loop through all your data sets (q3_1920, q2_1920...) at https://github.com/banillie/projectlibrary/blob/82cd3e51a3df140daf102a2db3d29e3d545c7107/analysis/data.py#L20, calling q3_1920.projects for example, and append them to a set() (so that you don't get duplicates) - see below for an example.

That would then give you a list of all projects in all those masters, which you could store in a variable in https://github.com/banillie/projectlibrary/blob/master/analysis/data.py and instead of importing individual variables you just import the list of project names - for example you import these objects at development/cost_v_schedule.py in analysis_engine, and then you continue to use them at https://github.com/banillie/analysis_engine/blob/065e89084c625c95ad63b5d255f396e8be77d229/development/cost_v_schedule.py#L254 where the list is passed to cost_v_schedule_chart for example.

How to create a set of project names from master data

This would go somewhere near the bottom of https://github.com/banillie/projectlibrary/blob/master/analysis/data.py.

project_names = set()

project_data_lst = [q3_1920, q2_1920 ...] # you have already created this variables
for project in project_data_lst:
    for name in project.projects:
        project_names.add(name)

project_names = list(project_names)  # you can then import this wherever you need it in `analysis_engine`.

I may be missing some other requirement you have for having separate importable variables for all these strings, but this is a starter for 10. One big assumption here is that you are content that the project names in the masters are correct, but we could come up with some code elsewhere that deals with that

banillie commented 4 years ago

Hi Matt,

Would that solution store the project names are variables themselves i.e. a12 = 'A12' ? TBH having the project names as variables not strings is not that hard to live with, but having to put an exact string match for the master.data[project_name] dictionary keys is more problematic.

If you look at this code from analysis_engine, you have to build a list of string name to release the necessary project values - this means going into the excel wb, finding the relevant key, and the copying and pasting the exact string name into the list. If there was some way for the key names to be store automatically as their own variables e.g. bicc_approval_point = 'BICC approval point' then it would be much quicker to build the list.

https://github.com/banillie/analysis_engine/blob/master/data_mgmt/simple_data_query.py

What do you think? Cheers, Will.

yulqen commented 4 years ago

I presume you mean the list of keys at https://github.com/banillie/analysis_engine/blob/368f92469f51c95a048c09a2a4f91fefd974ba56/data_mgmt/simple_data_query.py#L53 ?

I do see that depending on what data you want to extract at this point, this list could be very long - but this far simpler and more understandable that having some code somewhere that tries to generate variable names automatically. It's possible to generate variable names in Python but it's complicated - bear in mind the whole point of having a variable is for the programmer to intentionally label an object - it just isn't worth doing here and doesn't really make sense.

I'm not entirely convinced I get your pain point here, but until I sit down and speak to you and we go through this together, I might be tempted to suggest that you're coming up against some fairly routine issues that are entirely understandable based on how this project is evolving. You're talking now about a lot of master data piling up, stored in a very rudimentary format. This is why we have databases - for example, to take away some of the pain in refering to rows of data. In a proper relational database, we index a row on its unique id (an integer, usually), rather than a long string. It's just the upshot of using the "master" as the database, maybe.

There are things you can do to make things easier. Options might be to keep the list of keys you're interested in in a text file in your project folder and then write a function that reads the list and populates a list somewhere automatically.

Have a read up on reading files:

f = open("master_key_register.txt", "r"):
keys = f.readlines()  # a list of keys
f.close()
banillie commented 4 years ago

Yes that all makes sense. I thought that this might be a bit of an unnecessary rabbit hole to go down. Will take a look at reading files. Cheers, Will.

yulqen commented 4 years ago

Happy talk through any ideas you have re managing a the key list. There is a lot you can do: save important keys in a file as above, same, but with a CSV file, autogenerate the list of keys from a master (I could do this in the API, but it would be easy to do in your code) and store them in a file, store each quarter data in a class object (Section 9.3 onwards at https://docs.python.org/3/tutorial/classes.html), store all master data in a SQLite database (https://sqlite.org/index.html) - the list goes on. We should discuss over coffee next time!

banillie commented 4 years ago

Thanks Matt, will look into those links. Cheers.