numenta / nupic-legacy

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.
http://numenta.org/
GNU Affero General Public License v3.0
6.34k stars 1.56k forks source link

Document the proposed directory structure for nupic.core and nupic #591

Closed rhyolight closed 10 years ago

rhyolight commented 10 years ago

As a part of the nupic.core extraction, we need to create some documentation of the proposed directory structure after the extraction is complete. This should include both repositories, as well as details about where the nupic.core dependency exists within nupic.

subutai commented 10 years ago

@david-ragazzi If I remember correctly you took a crack at laying out the directory structure before. Do you want to propose something?

Cheers, Subutai

david-ragazzi commented 10 years ago

Yes @subutai !

I think we should have a default directories structure for each repo in order to Nupic repositories follow the most used convention on open-source projects.

PR #499 has the details about a example for Nupic repository:

But this is the structure is that I suggested later.. :-) The only difference is the location of them. Just for compare:

  • $NUPIC becomes $REPOSITORY/Source
  • $NTA becomes $REPOSITORY/Release
  • /tmp/builddir could continue the same and Build_System (IDE solution file or make files generated by CMake) would be put outside from /Source (which could leave still cleaner the /Source folder).

At first configuration, CMake would create default values to $NUPIC and $NTA based on current source dir which CMake was called. After that, user could feel free to change the release location just changing the $NTA value (thanks @breznak for this observation!! ).

Bellow a screenshot of this model: screen shot 2013-12-16 at 9 35 49 pm

In the above case, $NTA is set as "~/Desktop/nupic-master/Release". But if some user want change to another location, it is just re-config $NTA.

@scottpurdy has commented in other messages something similar when suggested a "src" folder to put only code. I don't remember where.... hehe

Summary of the discussion

Proposed structure until now:

nupic.core
    |-- LICENSE.TXT
    |-- README.md
    |-- build
    |    |-- ALL_GENERATED_FILES_GO_HERE
    |    |-- release
    |    |    |-- bin
    |    |    |-- include
    |    |    |-- lib
    |    |-- scripts
    |    |    |-- BUILD_SCRIPTS_OR_IDE_GENERATED_BY_CMAKE
    |-- doc
    |    |-- BOTH_GENERATED_PLUS_MANUALLY_WRITTEN_DOCS
    |-- external
    |    |-- MIMIC_NUPIC_FOR_NOW_STOP_GAP_MEASURE
    |-- include
    |    |-- NUPIC_INCLUDE_FILES_REPRESENTING_EXTERNAL_API
    |-- src
    |    |-- CMAKELISTS.TXT
    |    |-- main
    |    |    |-- SOURCE_FILES_RELATED_TO_PROJECT
    |    |-- test
    |    |    |-- SOURCE_FILES_RELATED_TO_AUTOMATED_TESTS
    |    |-- examples
    |    |    |-- SOURCE_FILES_RELATED_HELLO_WORLDS_AND_FULL_EXAMPLES
    |    |    |-- some_example
    |    |    |-- app1
    |    |    |-- app2

In Travis file at nupic.core repository, the process is something like:


# places cursor to the source folder
cd /src
# calls CMAKE passing "/build/scripts" as destination of the Autotools scripts and passing "/build/release" as install prefix.
cmake /build/scripts -DPROJECT_BUILD_RELEASE_DIR=/build/release
# places cursor to the scripts folder, i.e. the folder with Makefiles generated by CMake
cd /build/scripts
# calls Make to build the project. 
# binaries files will be located in "/build/release" folder.
make
scottpurdy commented 10 years ago

I like @david-ragazzi suggestions. I am more familiar with C++ code living inside a src directory in the root of the repo and building into a build directory before being copied to the install location (make builds into build/bin, build/lib, etc dirs and make install copies to installation locations). But open to what people think will be most obvious for newbies joining the project.

I looked for a C++ project on Github and the first one I found was https://github.com/rethinkdb/rethinkdb It has code in src and builds to build/release. So kind of similar to David's suggestion and kind of similar to what I am used to seeing. And that is just one data point.

david-ragazzi commented 10 years ago

@scottpurdy I liked the project structure in your suggested link.

Just to avoid the same confusion in PR #499 : "build_system" folder is not the same than "build" folder. "build_system" is just to store the files generated by CMake (i.e. IDE solution or Make files), while "build" is the output compiled by the files that are in "build_system". Anyway, we could change the name "build_system" to "gen" or "generated" or something similar, to avoid this confusion with terms...

rhyolight commented 10 years ago

I never liked the name of of build_system, so a would prefer a rename as well. generated sounds good to me, but is there any name that is standard for a directory that contains these types of generated file for C++ projects?

david-ragazzi commented 10 years ago

@rhyolight I have no idea.. but I believe there's not default name...

Some suggestions:

What do you think?

Ah.. this folder name is only for internal use (i.e. travis build), so any other name wouln't have any problem with CMake file. The user is free to choose any name. However I believe even so all should adopt this convention for avoid future confusions with names on mail list. Just a suggestion..

deanhorak commented 10 years ago

I've seen "work", "temp" or "scratchpad" used variously for such transient directories. I'm not aware of any standard name however.

fergalbyrne commented 10 years ago

I'd suggest keeping all the directory names lowercase, and src, build (sith build/libs, build/bin) etc are more conventional than the other suggestions. Everyone will simply understand those names.

subutai commented 10 years ago

I like @fergalbyrne 's suggestions. With python packages that require C++ compilation, the convention is also to put everything under build. So, build/lib, build/bin, etc. They also put generated files in there. So build/temp would contain the generated files. This way the user just has one directory to delete if they want to manually clean everything.

david-ragazzi commented 10 years ago

I liked @subutai idea on put everything related to build on build. Althought I think build/temp is not much intuitive, maybe build/scripts could be better. Something like:

This way we combine @subutai, @scottpurdy and @fergalbyrne suggestions in a single structure.

subutai commented 10 years ago

Sounds good to me.

deanhorak commented 10 years ago

Yes, everything generated should definitely be placed in a subdirectory under "build". Within the build directory most projects have various directories indicating what gets placed in each rather than a catchall "temp" directory. For instance, Chromium has "master", "scripts", "site-config", "slave", "test", etc all within the build directory.

sjmackenzie commented 10 years ago
git clone nupic.core
mkdir build
cd build
cmake ../nupic.core

Therefore build does not need to be part of the git repo at all. My suggestion is to keep it the standard src include doc etc format and forget about build we are moving away from autotools. This build part of repo is an autotools mentality - I believe.

Completely separating build from the repo means we never have to worry about committing generated artifacts into the repo by mistake.

As we will eventually be building nupic.core separately from nupic we don't need to worry about a build dir. Though this approach works equally well during the stop-gap period in that nupic.core can exist as part of nupic's directory structure (ie nta) ie:

git clone nupic
cd nupic
git submodule init
git submodule update (this pulls `nupic.core` into `nta` and is abstracted out in `build.sh`)
cd ..
mkdir build
cd build
cmake ../nupic

Would plough into nupic and the generated files for nta/nupic.core would also be put into build without any fuss at all.

So we could, for example, have this directory structure:

(mkdir) numenta
----------> (git clone) nupic.python
----------> (git clone) nupic.python-test-feature1
----------> (git clone) nupic.core
----------> (git clone) nupic.core-test-feature3
----------> (mkdir) builds
------------------> (mkdir) nupic.python
------------------> (mkdir) nupic.python-test-feature1
------------------> (mkdir) nupic.core
------------------> (mkdir) nupic.core-test-feature3

Say you wanted to build numenta/builds/nupic.core one would:

cd numenta/builds/nupic.core
cmake ../../nupic.core
make

Sometimes one justs wants a clean git clone for testing. This approach just keeps things clean.

rhyolight commented 10 years ago

@david-ragazzi Can you take the suggestions we've received above and re-draft your initial proposal?

david-ragazzi commented 10 years ago

@sjmackenzie I understand you concern but I don't believe this is a Autotools stuff.. And although Autotools uses similar convention, the concept is not restrained to this tool.

@all: The own Travis could update the build folder when it compiles the repo (i.e. we wouldn't delete these folders in each build made by Travis). This way any newbie could download diretly the generated binaries in case of he doesn't have intimacy with the source or even with C++ code!

sjmackenzie commented 10 years ago

Ah yes correct there is the -prefix flag. Forgive me you are correct.

sjmackenzie commented 10 years ago

@david-ragazzi One could easily put the build folder in the repo for travis, for whatever reason. It makes no difference. Secondly newbies wanting to dip their hands into compiling nupic.core will be reading the build instructions so the directory structure is painfully easy to follow. From a development point of view (not Travis, nor newbies) this structure is fluid and easy to follow.

That's great that Travis has a 'download binaries' feature!

sjmackenzie commented 10 years ago

don't forget the include folder.

david-ragazzi commented 10 years ago

@sjmackenzie

Secondly newbies wanting to dip their hands into nupic.core will be reading the build instructions for so the directory structure is painfully easy to follow.

Isn't supposed that such information about how get only the binaries should be in Readme.md?

And although I'm not a GitHub expert, I believe that it has some packages management, i.e. packages only the source or the binaries which users could donwload them separately..

don't forget the include folder.

Do you mean folder wih header files when you say include? Isn't supposed that such folder is a subfolder of src?

sjmackenzie commented 10 years ago

I amended my comment by adding into compiling - ie into compiling nupic.core

sjmackenzie commented 10 years ago

@david-ragazzi

Typically you do not need to worry about binary distribution.

Our main concern is making life easy for developers and achieving a flexible yet standardized development environment that becomes the 'culture' of nupic development communicated via the Readme.

david-ragazzi commented 10 years ago

@sjmackenzie

Maybe I am confusing the things.. but.. Isn't Nupic dependent of Nupic.Core, but not the inverse? From my understanding, Nupic should gets the only output generated by Nupic.Core, not interfere on Nupic.Core build process. Furthermore, this is a default structure used by many projects as expressed by the majority of the members. I don't think it is a painful structure to follow (except in the special case that you cited).

sjmackenzie commented 10 years ago

Dependency is not part of my discussion. But I will include it now as I see a discontinuity between what we are saying.

This is what we are working towards that we do not have now: nupic.core is an independent artifact that exists as an installed library somewhere on the system, installed manually or via a package manager nupic.python is dependent on 'nupic.core' being installed manually or via a package manager. nupic.python does not build or install nupic.core nupic.core does not build or install nupic.python

What we have now ( a temporary stop-gap situation that will help us to the above goal) is to allow nupic to drive the building process. We first must make sure nupic is completely moved over to CMake. As a byproduct of doing the transition nupic.core should be able to be built independently.

The transition: I suggest that nupic completes its transition to CMake. Once this is done we can start evolving the directory structure of nupic.core.

This way the transition is safer and core builds independently. Changes are done in little steps all nupic is stable.

If you want to focus only on getting nupic.core to build with cmake and change directory structure at the same time, yet make sure nupic existing build system can talk to nupic.core cmake build system then go for it. But I don't suggest it.

Hence it is better to get nupic as a whole building with cmake (in mainline).

That is why build being included in the repo or not makes absolutely no difference in cmake world. Directory structure is not important at this stage. src include doc etc is the standard layout structure. It seems obvious that we should adopt it.

rhyolight commented 10 years ago

@sjmackenzie :+1:

david-ragazzi commented 10 years ago

@sjmackenzie Now I understood you and I agree in many points.

However, since that CMake can run in parallel to Autotools, we can implement and test this without any headache.. After that the CMake files in each repo are working ok, we just remove Autotools stuff.. and voilá.

Anyway, your suggestion is fine. It's a top-down approach for this job, while mine is bottom-up.. But remember nupic.core CMake file already will set $NTA environment variable, so since we have this variable configured we just can reference it from nupic CMake file (Travis or local machine). In this case, $NTA value will be nupic.core/build/release

To say truth, I sincerely don't now how to do it without first ensure that nupic.core is generating its output correctly and then reference it (static or dyn libs) through $NTA variable (considering that Travis could share this variable between repos, of course). Any ideas are welcome. :-)

That is why build being included in the repo or not makes absolutely no difference in cmake world. Directory structure is not important at this stage. src include doc etc is the standard layout structure. It seems obvious that we should adopt it.

Yes, as I said in other message, the own user could choose another location, but CMAKE_INSTALL_PREFIX need have a initial value. So as $NTA still is not configured, CMAKE_INSTALL_PREFIX and $NTA is set to the subfolder builld/release.

scottpurdy commented 10 years ago

I am not familiar with CMake. In my experience, doing a make will compile everything into build/... and make install will copy the bin, lib, etc files from build into appropriate system locations (or an arbitrary location specified by --prefix). So my expectation was that $NTA would point to the installation location, not the build location.

We also need to remove the need for environment variables to support simple installation mechanisms like pip or other package managers so please try not to rely on $NTA or similar.

rhyolight commented 10 years ago

We also need to remove the need for environment variables to support simple installation mechanisms like pip or other package managers so please try not to rely on $NTA or similar.

This would be a good topic for the nupic-hackers mailing list.

rhyolight commented 10 years ago

To say truth, I sincerely don't now how to do it without first ensure that nupic.core is generating its output correctly and then reference it (static or dyn libs) through $NTA variable (considering that Travis could share this variable between repos, of course). Any ideas are welcome. :-)

Can we agree that the nupic.core build will output to a build directory, but that it might be controlled by an option to the build script? I'm fine with there being a build/scripts and build/release.

I think we're at this point now for nupic.core, right?

Does anyone disagree strongly with this? If not, let's move to discussing the nupic directory structure.

david-ragazzi commented 10 years ago

@scottpurdy , @rhyolight Actually when we call CMake we could specify by command lin) where the output should be installed (such value is stored on CMAKE_PREFIX_INSTALL CMake internal variable). So we don't need use environment variable (only if we want change the binaries local but continue referencing it).

In Travis files at nupic and nupic.core repositories, it's just put the following:

cd /src cmake /build/scripts -DCMAKE_INSTALL_PREFIX:PATH=/build/release cd /build/scripts make

PS: I'm still trying understand integration between repositories. So please patience.. hehe

david-ragazzi commented 10 years ago

@rhyolight

Does anyone disagree strongly with this? If not, let's move to discussing the nupic directory structure.

+1

subutai commented 10 years ago

I like it but have a question. One question: What will be in the include directory and what do we mean by "external includes"? nupic.core will have an external API via specific header files. Are we including those header files in this directory? Or will this contain the header files for external libraries used by nupic.core? Thanks.

rhyolight commented 10 years ago

Or will this contain the header files for external libraries used by nupic.core?

This was my assumption, but please someone correct me if I'm wrong.

subutai commented 10 years ago

For externals it's a bit more complex. If we are including them in the repository we will need the rest of the library too. We need the lib files. In addition some libraries such as apr generate platform specific includes. To handle that today we have:

external/
   common/
       include/
   darwin64/
      /include
      /lib
      /bin

So today we have all the platform independent files under command and all the platform specific files within subdirectories. The build system sets up the appropriate include and lib paths. I'm not suggesting we keep the above, just that we need to think through how we handle externals a bit more.

rhyolight commented 10 years ago

Maybe we're getting confused about which project we are proposing a directory structure for. I am thinking about nupic.core right now, so the includes directory would include things like boost. I think you are talking about the externals in nupic, right?

david-ragazzi commented 10 years ago

@sjmackenzie

I'm trying understand the integration between repositories and now I understand better you concern.. If we leave nupic.core as a submodule, the nupic directory structure really will be painful to follow, as nupic.core will handled as a subfolder :-(

In this case, how about nupic.core be compiled before nupic and its binaries being installed on external folder at nupic repository?

rhyolight commented 10 years ago

In this case, how about nupic.core be compiled first and its binaries being installed on external folder at nupic repository?

:+1: Yes I like that idea. As long as nupic.core can be built by giving it a build location, it would be easy enough, right? But will it be confusing when building nupic that people see new untracked files within the externals directory?

david-ragazzi commented 10 years ago

@rhyolight

Yes I like that idea. As long as nupic.core can be built by giving it a build location, it would be easy enough, right?

Yes! Thanks @sjmackenzie !

subutai commented 10 years ago

@rhyolight I was referring to how we would handle externals in nupic.core. It can't just have includes, also needs libs and platform specific includes.

The nupic externals is an example directory structure that handles all that. I'm open to other schemes but just pointing out that include by itself won't work.

sjmackenzie commented 10 years ago

include = header files for the nupic.core library artifact.

I'm really hoping there isn't a need for externals folder. Let us please seriously assess this situation.

Yes its so convenient having these libraries, I understand that, but that isn't a good enough excuse.

I suggest we 1) list out exactly what the external dependencies are, 2) see how serious the usage is 3) See if possible we can find a way not to use them. We need to get as slim as possible, lets set the goal at zero dependency. This will keep us strict, and develop a culture of minimalism.

I anticipate this to be quite a difficult task, with much time spent on removing/altering/slashing dependencies.

If we are going to take a good first step, of all things, let this be it. bloat is one of our worst enemies. We need to make it deeply ingrained into our culture that core is dependency zero.

subutai commented 10 years ago

@sjmackenzie That makes more sense to me. include = header files that comprise the nupic.core API.

I also would like to remove dependencies as possible. Here are the externals we use currently:

apr boost yaml-cpp zlib cycle_counter.hpp (easy to bring this into NuPIC)

david-ragazzi commented 10 years ago

@all

How about my idea related to removing nupic.core as submodule and only put its output (binaries) on some folder on nupic repository?

rhyolight commented 10 years ago

Let's NOT talk about removing externals at this point. Let's just put them somewhere we can agree upon and create a new issue for removing dependencies. Does that make sense? I just want to get the directory structure decided so we can start putting a build in place for nupic.core.

rhyolight commented 10 years ago

Sorry, it just makes me crazy that no builds or tests are running at all for pushes to nupic.core.

rhyolight commented 10 years ago

@all

How about my idea related to removing nupic.core as submodule and only put its output (binaries) on some folder on nupic repository?

I think it's more important to focus on the nupic.core directory structure and getting it building itself. Then we can talk about options for removing the submodule. As long as it is a submodule, nupic can just call into nupic.core to build and place the binaries wherever (which are not checked into the nupic repo).

Perhaps it was wrong to try to design both the directory structures of nupic and nupic.core simultaneously. If we can just decide on the nupic.core structure, we can move ahead on other fronts and defer some of the nupic decisions about how it uses nupic.core until later. Like I mentioned above, my primary concern is getting nupic.core building and running tests independently in Travis-CI.

sjmackenzie commented 10 years ago

Fair enough, though having externals or not pertains to directory structure. So for the mean time, as we have externals, then obviously they go in the externals folder. But I'm hoping that it becomes our goal to absolutely crucify that folder, obliterating it completely.

I understand there is no test harness, luckily this is in discussion stage, we're at liberty to decide on a decent plan for the overall structure of core. A directory structure discussion touches all parts. Fear not, we're taking small baby steps! I would also knee jerk if I was in your position, though it might be best to allow this conversation to mature so that folks can gain deeper insight into the problems faced.

On Fri, Feb 7, 2014 at 6:13 PM, Matthew Taylor notifications@github.comwrote:

Sorry, it just makes me crazy that no builds or tests are running at all for pushes to nupic.core.

— Reply to this email directly or view it on GitHubhttps://github.com/numenta/nupic/issues/591#issuecomment-34473947 .

rhyolight commented 10 years ago

But I'm hoping that it becomes our goal to absolutely crucify that folder, obliterating it completely.

https://github.com/numenta/nupic.core/issues/16 Just to keep this in scope, let's continue the conversation about removing externals here at a later date.

though it might be best to allow this conversation to mature so that folks can gain deeper insight into the problems faced.

Point taken. I just want to keep a focus on the goal of a simple tree for nupic.core and nupic. Once we can agree on nupic.core, we can continue work there.

david-ragazzi commented 10 years ago

@rhyolight

I think it's more important to focus on the nupic.core directory structure and getting it building itself. .

Agreed.

Well, it seems that we don't have more objections related to nupic.core dir structure..

"If any of you has reasons why nupic.core should not be updated with the proposed directory, speak now or forever hold your peace."

scottpurdy commented 10 years ago

@david-ragazzi - great work getting initial consensus from so many of us opinionated people. It sounds like we are only settled on nupic.core so far, is that right? Do you mind writing out what we settled on so everyone doesn't have to dig through the conversation to piece it together?

subutai commented 10 years ago

@scottpurdy Yes, this was just about nupic.core. Here is @david-ragazzi's proposal so far:

nupic.core
|-- LICENSE.TXT
|-- README.md
|-- build
|   |-- ALL_GENERATED_FILES_GO_HERE
|   |-- release
|   |   |-- bin
|   |   |-- include
|   |   `-- lib
|   `-- scripts
|-- doc
|-- external
|   `-- MIMIC_NUPIC_FOR_NOW_STOP_GAP_MEASURE
|-- include
|   `-- NUPIC_INCLUDE_FILES_REPRESENTING_EXTERNAL_API
`-- src

We have not discussed location of tests or example applications.

sjmackenzie commented 10 years ago

comments inline:

nupic.core
|-- LICENSE.TXT
|-- README.md
|-- build
|   |-- ALL_GENERATED_FILES_GO_HERE
|   |-- release
|   |   |-- bin
|   |   |-- include
|   |   `-- lib
|   `-- scripts
|-- doc
|    `-- BOTH_GENERATED_PLUS_MANUALLY_WRITTEN_DOCS
|-- external
|   `-- MIMIC_NUPIC_FOR_NOW_STOP_GAP_MEASURE
|-- include
|   `-- NUPIC_INCLUDE_FILES_REPRESENTING_EXTERNAL_API
|-- src
|    `-- FLAT_HIERARCHY 
`-- tests
     `-- FLAT_HIERARCHY 

1st a question: what is in /nupic.core/build/scripts? 2nd: whilst editing the /nupic.core/docs/ folder above it dawned on me that we could expose a man page which entails a command line interface program. What is the feasibility of such a program? My gut tells me if we are able to create a pipes structured program that neatly fits into the GNU toolset we have a winner. WHY? script writers could plug up nupic to their scripts to monitor their programs. Nupic.core could for example be told to monitor top's output for anomalies. nupic.core could watch security scripts for intrusion detection; like a sentient tripwir, so to say. Gosh gdb could integrate nupic and use nupic to directly highlight output anomaly/error. Now Unix structured programs are one of the successful program structures. If we are able to adopt it (via keeping a thin executable wrapper around a library) we might crack the API / use style / structure needed for all the different language bindings.

May we put that in our pipe and smoke it?