lo-co / atm-py

Set of python packages for processing atmospheric data.
MIT License
0 stars 7 forks source link

clean up that mess #3

Closed hagne closed 8 years ago

hagne commented 8 years ago

Everyone! :-) give some suggestions about how to rearrange the repository!!!!

hagne commented 8 years ago

Lets try that mind map tool: https://www.mindmup.com/#m:a14a3d6860444d0133d0fa657bdfa6a28f

lo-co commented 8 years ago

Wow...this is pretty old now. Still want to try mindmap?

lo-co commented 8 years ago

Bump with new address... (sorry, don't really how to use this thing yet)

https://www.mindmup.com/#m:a1b616f4d096eb013322ac19bf3939d828

GavinMcMeeking commented 8 years ago

Are you guys trying to organize the project a bit better? Only thing have so far is maybe putting "aerosols", "mie" and "rayleigh" all in one directory "models" or "theory" or something like that to bundle theoretical/modeling stuff together?

lo-co commented 8 years ago

Did you see the mindmup?

Yes, we are trying to reorganize the project. Right now, we don't really have a philosophy regarding the organization. Hagen and I have mostly just discussed it. There is now a branch (NewStructure) of the master where I am going to start rearranging. The repository as it stands is not really safe to fork yet given the terrible condition it is in (too much flux). I think that when

  1. the structure is more concrete, and
  2. we have mechanisms in place for insuring that the code that is pulled into the master is in good shape

we will be in a good position. Here, we are just discussing the organization. Basically, we want this to be a package that could be registered with conda, pypi or other package distributors. So, to that end, the top level directory for distribution will be atmPy. And that is where this discussion is concerned - what should the structure of the files be in this root directory.

GavinMcMeeking commented 8 years ago

Gotcha, maybe best thing for me to meet with you guys for lunch as you suggested and get a general feel for the project?

lo-co commented 8 years ago

Sounds good...

That being said, I just want to move forward, so if I get time, I am going to move ahead on the branch NewStructure when I get time...

hagne commented 8 years ago

I like your basic structure.

I extended your mindmap a little bit : https://www.mindmup.com/#m:a1402601009bc20133d67429457eab367d

lo-co commented 8 years ago

Sorry, this is all over the place and I will likely contradict myself at some point below, but there is a lot to process here. I have a proposal though: if the routine(s) are related to processing data from a non-OTS set of software from an instrument, then THIS SHOULD NOT BE IN THE LIBRARY. In this, I would include non-OTS related instrument routines also such as the POPS. This would include a lot of Hagen stuff such as the Piccolo, Manta Payload, etc, etc. These routines are usually one-off routines and just create a lot of cruft for the user to sort through. On top of this, the routines often contain nothing useful except how to get the data into pandas or some other format. These routines should be the possession of the specific user or group who has need of these.

Instrument libraries that should be retained are those that

What say you?

btw - you can start to see how some of this is beginning to take shape in the NewStructure branch.

hagne commented 8 years ago

What means OTS?

lo-co commented 8 years ago

off the shelf

hagne commented 8 years ago

Totally aggree. Piccolo has nothing to do with aerosols anyway. We might want to use some of it in the example section to help people to parse there data, in particular when it comes to creating the time-stamps things can get rather tricky. Any comments on the mindmap: https://www.mindmup.com/#m:a155ce22409c500133b853081f2f88ed14 In particular, what do you think about the tools foulder. Some of the stuff in there needs to go somewhere else, like conversion. The idea is a tool-kit for very general helper functions?!? Maybe we should hide it better, because it is no used directly? This is actually a more general thing. I often outsource functions to improve readebilty and put them in a separate module which I call tools.py. Any comment on that practice?

lo-co commented 8 years ago

Ya...I kind of sorta got into the tools folder and then realized that there was a lot of routines you were using for instrument specific applications. I had started a longer post that commented on these tools, but then ditched it when it became overwhelming. Regarding the tools folder:

lo-co commented 8 years ago

And regarding the mindmup - I am not sure how I feel about this suggestion. atm in the atmPy implies that this is atmospheric. Wouldn't general atmospheric routines such as lapse rate, etc. just go in the main folder?

hagne commented 8 years ago

Regarding tools: I will go through it and clean out anything that is used only by me. However, Its less than you might think, e.g. I am defining plotting routines that use the plt_tools (same with the pandas tools).

Regarding the atmosphere folder: I thought the same thing. However, putting it in the main folder? I thought we might not want any modules in the main folder, just other folders? But it's a good point, let's put it in the main folder and see how it feels ;-)

lo-co commented 8 years ago

You might be right about this. Maybe we need a "general" folder?

Regarding plotting and pandas - do we want to lock the user into a particular way of expressing the data returned by the routines? pandas might be good but your plotting routines assume that the user will use matplotlib as the plotting tools; is that what you want?

We might want to go over this folder in person together. Don't get rid of anything on your end with the idea that you are restructuring - this could create some merge headaches down the line. Just let me know what you definitely can get rid of and I will.

hagne commented 8 years ago

pandas is essential to the sizedistribution module. Everyone should use pandas anyway ;-)

Locking people to matplotlib ... good question. However, these plotting functions are just soooooo useful. I would prefer not to get rid of them ... at least not all.

lo-co commented 8 years ago

Yeah...but they are essential to you. Is the average user going to find them essential?

Regarding pandas - I think using it is fine. I just think the routine you have in there is not useful to everyone or is easily reproduced in private libraries...

hagne commented 8 years ago

I think we are mixing things up here, lets talk about this another time. Maybe we are getting to much into the detail here for now. I think we have an idea about the general Structure and maybe should go ahead and reorganize things into the folder. We can then focus on content of particular modules. What do you think?

lo-co commented 8 years ago

I think that is incorrect - when thinking about structure you are inherently thinking about organization. We are attempting to make the library more coherent. To that end, we need to figure out 1) what belongs and 2) where it belongs. If these plotting and pandas routines are essential to the function of your stuff, then we need to figure out what stuff they are essential to so that we can place them exactly where they need to be. Stuffing everything into a generic tools folder isn't sufficient. These tools are not used broadly across the library and tools just becomes the defacto junk drawer of the library. If you want people to use this, then things need to be organized.

hagne commented 8 years ago

1) There belong to where ever I use plotting and pandas. Right now that means everywhere below /aerosols. Right now I am starting to work on radiation therefore I will need them there too. => These functions need to be close to the trunk of our tree

2) I do not expect any end user to use these tools. I would welcome other developers to use them though, although it is unlikely unless someone adopts to my style. It is likely that they will only be used by modules programmed by me. => Make them less visible? /_tools?

3) I will improve organization in the future. However, its supposed to be a toolbox, which are inherently messy? Or is there another concept than labeling a draw and chucking everything in it that kinda matches. I could make subfolders, or just comment more in the code?!?

lo-co commented 8 years ago

Hmmmm....I thought I replied to this, but apparently I didn't. I will address these points randomly:

I do not expect any end user to use these tools.

This gets to a point I made above - if you don't expect users to use them, then why are they there? This library is not for developers but for users, no?

However, its supposed to be a toolbox, which are inherently messy?

My thought is that things that are intended for use by others shouldn't be "messy". If it is messy, then the directory hasn't been given enough thought.

There belong to where ever I use plotting and pandas.

I don't understand this. Why do you need them? What do they do that is integral to the functionality we are trying to provide?

hagne commented 8 years ago

I do not expect any end user to use these tools.

This gets to a point I made above - if you don't expect users to use them, then why are they there? This >library is not for developers but for users, no?

They are there because they are used by modules that are used by users.

There belong to where ever I use plotting and pandas.

I don't understand this. Why do you need them? What do they do that is integral to the functionality >we are trying to provide?

They are also used by modules that are used by users. Almost all data in modules written by me is based on pandas DataFrames and Panels, so I came across similar issues in different modules. Do reduce redundancy I am collecting these tools and reuse them.

lo-co commented 8 years ago

OK...At some point I will look at the details. I think that the tools folder will be revisited when I can actually look at how it's being used. I will have much more detailed rec's then.

lo-co commented 8 years ago

So, this is what is messed up with the tools folder. I just discovered that you are duping some of my code. You have stokes_number and stopping_distance in the tools folder. These should be removed in put in the aerosol physics folder as far as I am concerned. flow_rate2flow_velocity should be in a general tools folder and would be better suited in the General folder. This is kind of a mess...

hagne commented 8 years ago

sure thing!

hagne commented 8 years ago

was just checking out the the new structure branch. I know you are not a fan of very descriptive names and I am a fan of over descriptive names. What about a compromise?

in aerosols:

The last couple of dayes I worked on simplifying sizedistribution and actually splitting it up and making some of the class attributes available out side the class, so it can be used in a more functional form (as you suggested). I am fare from ready though, so it is probably still rather confusing. Anyway I wanted to play around with updating my changes to the structure changes you made in new_structure. BTW, the name of my branch was an accident and renaming it sounded kind of annoying. the name of the branch which has my changes is ht_sb_freeze.

lo-co commented 8 years ago

Hmmmm....I haven't checked in for some time...I am testing some things to see if they make sense. I will take a look at this again.

Regarding instr - really?? Name one distribution that has a directory called instr that contains instructions.

Don't try to do anything related to this branch - I think that it is going to be tough merging things. I am trying to refactor appropriately where I can.

hagne commented 8 years ago

Regarding instr - really?? Name one distribution that has a directory called instr that contains instructions. I really do not understand your approach of that? Why not use real words like many other distributions. Just go though numpy and matplotlib. Sure they use some wired stuff, but alot are whole words. My opinion is that more descriptive words are increasing the learning curve. Maybe we should come up with a rule. Like:

  • do abbreviate if longer than xy
  • do not abbreviate if shorter than xy

I suggest that because I have the feeling this topic will come up more often in the future if we don't come up with a rule ;-)

Don't try to do anything related to this branch - I think that it is going to be tough merging things. I am trying to refactor appropriately where I can.

I think we should merge more rather than less, this makes us more familiar with how to handle problems. Also, I want to go ahead with things and would like to avoid to big clashes.

hagne commented 8 years ago

The pythonic mantra is readability. The more I google the more I come accross the opinions: "abbreviations should be avoided at all cost"

GavinMcMeeking commented 8 years ago

So silly compromise solution, instead of instrument or instru what if you called instru "device" instead. It has readability, but still maintains a clear tie to the contents?

hagne commented 8 years ago

sounds good to me!

lo-co commented 8 years ago

The more I google the more I come accross the opinions: "abbreviations should be avoided at all cost"

This statement clashes with every repo. When is the last time you saw "utilities" or "library" or "input-output" written out in a module name. Take a look at naming conventions in popular repos. SciPy does not call the folder fast_fourier_transform_package - it is fftpack. I think it is not that you don't want to shorten, it's that you don't want to shorten in a manner that obstructsreadablity.

From PEP-0008 on **module names:

Modules should have short, all-lowercase names

From Python Coding Conventions general comments about naming:

Avoid using names that are too general or too wordy. Strike a good balance between the two.

and example provided

  • Bad: data_structure, my_list, info_map, dictionary_for_the_purpose_of_storing_data_representing_word_definitions
  • Good: user_profile, menu_options, word_definitions

Long names can (and do) obstruct readability

lo-co commented 8 years ago

The instr discussion seems to be bikeshedding. instrument is not a problem. The bigger issue is class/function definitions like convert2numberconcentration or ExtinctionCoeffVerticlProfile. These are overly verbose (IMO) and can be shortened in a manner that does not obscure their intention.

Nevertheless, this is a discussion for another issue that was called out by Hagen not for the structure but simply for naming conventions. I suggest that this discussion be shifted to another issue that focuses on these. Right now, I am more concerned with the structure as well as testing than I am about naming. I think naming will become more of an issue as we swing around and start examining what the package actually contains. For now, I am going to leave the naming argument...

hagne commented 8 years ago

I knew you would pick the these abbreviation which had been put into stone a thousand years ago and which are used in every programming language. If we come accross these things we of course do not write them in full length!

But as you quote further down. There is now reason to abbreviate: user_profile, menu_options, word_definitions which are longer then instrument or sizedistribution

hagne commented 8 years ago

ok! If its about the structure only, why not finalizing the mindmap and putting it into action?!?

lo-co commented 8 years ago

Last thing to say about naming: you have chosen three of the bad examples provided. These are overly vague names and even if they weren't, there would not be any meaningful way to shorten them. I would suggest that things like num and conc and atm are meaningful to parties that use these packages. Surely you didn't struggle with what the library numpy was intended to do, did you? And we can all agree that when we see a package that has *py in it, we know that it is likely a python distribution? You are using these every day and don't seem to struggle with it; why do we have to get verbose when we start writing code?

lo-co commented 8 years ago

What do you mean finalizing the mindmup? I am hoping that with NewStructure, I can implement a vision that includes testing as well as some of the cruft removal and restructuring.

hagne commented 8 years ago

But how can I implement my changes if you don't want me to merge?

hagne commented 8 years ago

I thought, if the mindmap is set, it takes an hour to just do the changes.

hagne commented 8 years ago

I suggest we do multiple steps. 1) we create foulders according to the mindmap and copy the files into the orders 2) merge 3) fixing one file, split it up make it nice ... 4) merge 5) next file ... at the end of each step me make sure its still working.

lo-co commented 8 years ago

Maybe if you want to do this then we tear the whole thing down and start adding everything back in. What you suggest above sounds...exhausting... I am pretty certain that what you suggest is going to take about a year. The refactor tool in PyCharms makes moving a breeze. Why do we need to go file by file?

Are you still changing your stuff that is in the library now?

hagne commented 8 years ago

I would use the refractor for each of these steps, thats why I thing its not problem to do it in the way I suggested it. In particular step 1) would be a no brainer. And yes I am still working on the libraries almost every day. If you want I can take my current working branch and reorganize it according to our proposed folder structure this weekend. Since nobody else is using our stuff anyway, I can than just pull it back into master and we both start testing our stuff and start fixing thinks. We are in a particularly nice situation right now, since our stuff is not too much entangled. So you can fix your stuff and I fix my stuff. Once we are done with that, we start talking about the next wave of changes. What do you think?

lo-co commented 8 years ago

OK. I am now working off master and you will see an update in another 10 min. Pull. No code was deleted or changed but a lot was moved.

lo-co commented 8 years ago

Done. Added a folder that contains files that I think should be removed. You may disagree. Let's discuss.

Also, will be adding tests. Let me know if we can close this.

hagne commented 8 years ago

what ever.

hagne commented 8 years ago

please just change instr to instruments and phys to physics and we then close it.

hagne commented 8 years ago

and rad to radiation

hagne commented 8 years ago

everything is lower case what about making General to general

lo-co commented 8 years ago

I did. Did you not see it? Feel free to change the general folder.

Can you get rid of GUIs folder?