Feature consideration - predefined loadable subgraphs

google / module-server

module-server

Apache License 2.0

567 stars 59 forks source link

Feature consideration - predefined loadable subgraphs #2

Open genericallyloud opened 11 years ago

genericallyloud commented 11 years ago

Hi, I just watched the jsconfeu presentation. I was eager to see your approach so I could compare it to my own. I also have an optimizing module server of sorts. My team and I went down the same sort of road. First started loading files as needed. Then we concatenated everything upfront. Then we hit the 1MB download and knew we had to break it apart. I had actually already done the work of making a dependency graph in order to do the single script concatenation so that I could correctly order the files in the concat script.

This is where our approaches diverge. I had considered doing something similar to your approach. What I dislike about it was the combinatorial explosion. I had gotten used to deploying with a single static script file. There are obvious advantages. Deployment is simpler, the scripts themselves can go anywhere, and you can even use a CDN. Browser caching is potentially much better because order of interaction doesn't change the scripts loaded.

In our approach, instead of allowing for the loading of arbitrary dependencies, certain dependencies were marked as roots of a new dependency subgraph. To get a big picture on the structure, in a very large app like ours, the dependency graph becomes a tree of subgraphs. The tree structure assumes that parents are always loaded before children. Dependencies can only belong to a single subgraph, but if a dependency is needed by multiple subgraphs, they are moved to the closest parent subgraph. As opposed to a module-server, we have what we call a "module compiler". It does many other things in addition to the dependency graph stuff, but that dependency graph is the heart of it. Anyway, at the very end of module compilation, the end result is a static js file for each subgraph, using a naming convention of the path to the module separated by underscores.

Navigating around the app results in dynamic script loads that might look something like:

main.js
main_child3.js
main_child2.js
main_child2_gchild1.js
main_child2_gchild3.js
main_child1.js
main_child1_gchild4.js

Where main, childX, gchildX are just example names for the root dependency of that subgraph. Notice that parents are always loaded before children, but siblings can be loaded in any order.

In our app, this would basically be main loaded initially in the html, and childX being based on first level nav, gchildX based on 2nd level nav. That works best for us, but obviously an app with more functionality per screen and fewer screens would use it differently.

I hope this was helpful. I know its a different direction than you are currently going, but seeing as I'm not able to contribute my own code to open source ATM, I was hoping to spread a little of my own experience/knowledge this way.

cramforce commented 11 years ago

Hey, thanks for the email. If I understand correctly what you are saying, this is essentially a full subset of the functionality of module server. Module Server does not force you to make such decisions early, but it, of course, allows you to make them. Is your request, that module server should have a feature to enforce a certain shape of the graph?

A future version of module server will likely require predefining which modules can be endpoints, which will enable things like closure compilers cross module code motion and synthetic modules.

On Wed, Nov 21, 2012 at 7:26 AM, genericallyloud notifications@github.comwrote:

Hi, I just watched the jsconfeu presentation. I was eager to see your approach so I could compare it to my own. I also have an optimizing module server of sorts. My team and I went down the same sort of road. First started loading files as needed. Then we concatenated everything upfront. Then we hit the 1MB download and knew we had to break it apart. I had actually already done the work of making a dependency graph in order to do the single script concatenation so that I could correctly order the files in the concat script.

This is where our approaches diverge. I had considered doing something similar to your approach. What I dislike about it was the combinatorial explosion. I had gotten used to deploying with a single static script file. There are obvious advantages. Deployment is simpler, the scripts themselves can go anywhere, and you can even use a CDN. Browser caching is potentially much better because order of interaction doesn't change the scripts loaded.

In our approach, instead of allowing for the loading of arbitrary dependencies, certain dependencies were marked as roots of a new dependency subgraph. To get a big picture on the structure, in a very large app like ours, the dependency graph becomes a tree of subgraphs. The tree structure assumes that parents are always loaded before children. Dependencies can only belong to a single subgraph, but if a dependency is needed by multiple subgraphs, they are moved to the closest parent subgraph. As opposed to a module-server, we have what we call a "module compiler". It does many other things in addition to the dependency graph stuff, but that dependency graph is the heart of it. Anyway, at the very end of module compilation, the end result is a static js file for each subgraph, using a naming convention of the path to the module separated by underscores.

Navigating around the app results in dynamic script loads that might look something like:

main.js main_child3.js main_child2.js main_child2_gchild1.js main_child2_gchild3.js main_child1.js main_child1_gchild4.js

Where main, childX, gchildX are just example names for the root dependency of that subgraph. Notice that parents are always loaded before children, but siblings can be loaded in any order.

In our app, this would basically be main loaded initially in the html, and childX being based on first level nav, gchildX based on 2nd level nav. That works best for us, but obviously an app with more functionality per screen and fewer screens would use it differently.

I hope this was helpful. I know its a different direction than you are currently going, but seeing as I'm not able to contribute my own code to open source ATM, I was hoping to spread a little of my own experience/knowledge this way.

— Reply to this email directly or view it on GitHubhttps://github.com/google/module-server/issues/2.

http://twitter.com/cramforce http://nonblocking.io

genericallyloud commented 11 years ago

Is your request, that module server should have a feature to enforce a certain shape of the graph?

Yes, I suppose that is my request. However, in that case, the module-server would not need to be a server as much as a build step. During development, it would probably still need to be a server so that I could do a page refresh without rebuilding (that's what I do now). I already have this working in my own code base, so I'm not really sure if its a request as much as a suggestion, but someday if there was feature parity, I might be happy to replace my custom code with module-server.

A future version of module server will likely require predefining which modules can be endpoints, which will enable things like closure compilers cross module code motion and synthetic modules.

Ok, I think that answers one of my remaining questions. I was going to say that the only reason module-server would not be a subset is because of the need to moving up shared dependencies, but if you're saying closure compiler already does this, I guess you're in a pretty good spot to implement this quickly.

The difference as I see it is something like this: My example

main.js
main_child3.js
main_child2.js

Translated to module-server

/js/main.js
/js/+child3-main.js
/js/+child2-main.js

Let's say for a moment that child3 and child2 each have a dependency on "foo", but main does not. The load order of child3 and child2 would affect where foo was included in module-server, but not my module-compiler. My module-compiler would put foo in main, even though nothing in main directly depends on it. Your module-server would currently put foo in child3 if that was loaded first, or in child2 if you changed the order and loaded it before child3. I think you were saying closure compiler could handle this, but I just wanted to clarify the issue.

cramforce commented 11 years ago

Module server does not require any of this moving. Whatever gets loaded first, gets the dep, the next one can use it. Why would you want to move it up the chain? Seems wasteful.

On Wed, Nov 21, 2012 at 8:36 AM, genericallyloud notifications@github.comwrote:

Is your request, that module server should have a feature to enforce a certain shape of the graph?

Yes, I suppose that is my request. However, in that case, the module-server would not need to be a server as much as a build step. During development, it would probably still need to be a server so that I could do a page refresh without rebuilding (that's what I do now). I already have this working in my own code base, so I'm not really sure if its a request as much as a suggestion, but someday if there was feature parity, I might be happy to replace my custom code with module-server.

A future version of module server will likely require predefining which modules can be endpoints, which will enable things like closure compilers cross module code motion and synthetic modules.

Ok, I think that answers one of my remaining questions. I was going to say that the only reason module-server would not be a subset is because of the need to moving up shared dependencies, but if you're saying closure compiler already does this, I guess you're in a pretty good spot to implement this quickly.

The difference as I see it is something like this: My example

main.js main_child3.js main_child2.js

Translated to module-server

/js/main.js /js/+child3-main.js /js/+child2-main.js

Let's say for a moment that child3 and child2 each have a dependency on "foo", but main does not. The load order of child3 and child2 would affect where foo was included in module-server, but not my module-compiler. My module-compiler would put foo in main, even though nothing in main directly depends on it. Your module-server would currently put foo in child3 if that was loaded first, or in child2 if you changed the order and loaded it before child3. I think you were saying closure compiler could handle this, but I just wanted to clarify the issue.

— Reply to this email directly or view it on GitHubhttps://github.com/google/module-server/issues/2#issuecomment-10604202.

http://twitter.com/cramforce http://nonblocking.io

genericallyloud commented 11 years ago

Ok, so I looked over the example I gave, and I translated it to module-server urls wrong. It should be:

/js/main.js
/js/+child3-main.js
/js/+child2-main,child3.js

I think that this is an extremely sensible and correct approach for dynamic loading, but the combinatorial explosion of + and - modules would lead to an unreasonable number of static files. Hypothetically, if I were to use module-server to generate static files, assuming only the three dependency load points main,child2,child3, I would have to generate:

main.js
main-child2.js
main-child3.js
main-child2,child3.js

child2.js
child2-main.js
child2-child3.js
child2-main,child3.js

child3.js
child3-child2.js
child3-main.js
child3-child2,main.js

I'm not a mathematician, but I believe I calculated correctly, and determined that 20 modules would result in over 10 million static files if there were no assumed load order.

Obviously, module-server does not try to make static files upfront. However, I think there are a lot of benefits, and it's what we chose to do. I suppose all I'm trying to do is point out the benefits, and say how we were able to accomplish it.

It's a balance of load order flexibility and upfront module creation. Sorry if I didn't make that clear. Module-server maintains the dependency graph at runtime, and concats on the fly based on a url. What I was saying is that there is value to creating a set of static files upfront - as a build step. So in production, my module-compiler is just a build step. It does all of the dependency graph stuff and then generates a set of static files - one for each subgraph. When we deploy to production, there is no dependency graph - on the server or on the client. The advantage is mostly 3 things:

ease of deployment (no additional server, just static files)
js files can go to a CDN
browsers can cache the js modules (including appcache)

With module-server, none of those are possible, because if it is possible to go to child2 before child3 or vice-versa, the url is different, and therefore the resulting JS code is different (unless you are willing to accept the huge combinatorial overhead).

The balance that we came to is as I described previously. Every load point has a single canonical module, so when you want to load child2, you will always load main_child2 no matter what other modules have been loaded (but assuming main has been loaded first). In order to do this, the two concessions are:

There is a required load order around the ancestor axis (but not the sibling axis)
Any shared dependencies move to the nearest common parent.

Perhaps you have another approach?

cramforce commented 11 years ago

We used the static system before. It is a sensible approach and certainly easier to maintain in production. Module Servers' approach, of course, works with CDNs (proxying CDNs only) and browsers can cache the JS.

See https://docs.google.com/spreadsheet/ccc?key=0AoIOxKkr6fGqdGJIT2FFYjJ1Q0JLbEFXUnpqSFQya1E#gid=0 for a table of trade offs between different loading strategies. Your approach is in row 6.

On Wed, Nov 21, 2012 at 11:07 AM, genericallyloud notifications@github.comwrote:

Ok, so I looked over the example I gave, and I translated it to module-server urls wrong. It should be:

/js/main.js /js/+child3-main.js /js/+child2-main,child3.js

I think that this is an extremely sensible and correct approach for dynamic loading, but the combinatorial explosion of + and - modules would lead to an unreasonable number of static files. Hypothetically, if I were to use module-server to generate static files, assuming only the three dependency load points main,child2,child3, I would have to generate:

main.js main-child2.js main-child3.js main-child2,child3.js

child2.js child2-main.js child2-child3.js child2-main,child3.js

child3.js child3-child2.js child3-main.js child3-child2,main.js

I'm not a mathematician, but I believe I calculated correctly, and determined that 20 modules would result in over 10 million static files if there were no assumed load order.

Obviously, module-server does not try to make static files upfront. However, I think there are a lot of benefits, and it's what we chose to do. I suppose all I'm trying to do is point out the benefits, and say how we were able to accomplish it.

It's a balance of load order flexibility and upfront module creation. Sorry if I didn't make that clear. Module-server maintains the dependency graph at runtime, and concats on the fly based on a url. What I was saying is that there is value to creating a set of static files upfront - as a build step. So in production, my module-compiler is just a build step. It does all of the dependency graph stuff and then generates a set of static files - one for each subgraph. When we deploy to production, there is no dependency graph - on the server or on the client. The advantage is mostly 3 things:

ease of deployment (no additional server, just static files)

js files can go to a CDN

browsers can cache the js modules (including appcache)

With module-server, none of those are possible, because if it is possible to go to child2 before child3 or vice-versa, the url is different, and therefore the resulting JS code is different (unless you are willing to accept the huge combinatorial overhead).

The balance that we came to is as I described previously. Every load point has a single canonical module, so when you want to load child2, you will always load main_child2 no matter what other modules have been loaded (but assuming main has been loaded first). In order to do this, the two concessions are:

There is a required load order around the ancestor axis (but not the sibling axis)

Any shared dependencies move to the nearest common parent.

Perhaps you have another approach?

— Reply to this email directly or view it on GitHubhttps://github.com/google/module-server/issues/2#issuecomment-10610065.

http://twitter.com/cramforce http://nonblocking.io

genericallyloud commented 11 years ago

We used the static system before. It is a sensible approach and certainly easier to maintain in production. Module Servers' approach, of course, works with CDNs (proxying CDNs only)

You're right, I hadn't been thinking about proxying CDNs because that's not what we use, but you're google, so I should have known better ;)

and browsers can cache the JS.

They can, but only when the load order is the same. Depending on the app, I suppose this might be more or less likely. A typical case might be something like an app with a landing screen, and lets say 4 other tabs. Assuming I want to lazy load additional code when I navigate to another tab, I would want tab1-4 to be my additional lazy load points. Depending on what order you visit the tabs, you would need to serve different files with different urls, so some of the caching would be lost.

Anyway, you clearly have a handle on it. It obviously depends on circumstances. I was merely trying to lend some experience with one approach in case it was something you eventually wanted to be able to support. If you have any interest in discussing more let me know, but it sounds like you've got it covered.

PS. I'm curious why approach 6 didn't get an X for awesome cacheability.

cramforce commented 11 years ago

With the dynamic approach you can simulate the static approach with discipline, but that can fail, of course.

The X for cacheability was an oversight! Thanks!

On Wed, Nov 21, 2012 at 12:36 PM, genericallyloud notifications@github.comwrote:

We used the static system before. It is a sensible approach and certainly easier to maintain in production. Module Servers' approach, of course, works with CDNs (proxying CDNs only)

You're right, I hadn't been thinking about proxying CDNs because that's not what we use, but you're google, so I should have known better ;)

and browsers can cache the JS.

They can, but only when the load order is the same. Depending on the app, I suppose this might be more or less likely. A typical case might be something like an app with a landing screen, and lets say 4 other tabs. Assuming I want to lazy load additional code when I navigate to another tab, I would want tab1-4 to be my additional lazy load points. Depending on what order you visit the tabs, you would need to serve different files with different urls, so some of the caching would be lost.

Anyway, you clearly have a handle on it. It obviously depends on circumstances. I was merely trying to lend some experience with one approach in case it was something you eventually wanted to be able to support. If you have any interest in discussing more let me know, but it sounds like you've got it covered.

PS. I'm curious why approach 6 didn't get an X for awesome cacheability.

— Reply to this email directly or view it on GitHubhttps://github.com/google/module-server/issues/2#issuecomment-10613260.

http://twitter.com/cramforce http://nonblocking.io