thechiselgroup / biomixer

BioMixer
http://bio-mixer.appspot.com/
16 stars 13 forks source link

Refactor API Fetch and Registry System #349

Closed everbeek closed 10 years ago

everbeek commented 10 years ago

Need to refactor API fetch and registry system, but I see the best way is simply to get a good OO framework. I ended up going with TypeScript, and other than the scoping issues with "this", I love it.

This ticket is going to embrace refactoring of the entire ontology overview into TypeScript, then refactoring the concept graph too. No real reason to start a new ticket or branch given it's all the same type of work.

Fixed problem below in another case, but it would be best to refactor the node to API call registry system seen in the ontology overview, as well as to refactor the fetch system. It should all be much more encapsulated and class oriented. Some of the fetch calls do not need a registry entry per node, and therefore we have at least two types of fetch calls.

Following from work in #335 that was intended for #311.

everbeek commented 10 years ago

Assessing TypeScript as a framework for OO classes, in order to encapsulate. Might like it for general use in this project too, if it would have minimal impact on maintainability later.

So far I am liking what I see for OO support: http://visualstudiomagazine.com/Articles/2013/02/01/TypeScript.aspx?Page=2

Contrast that code with this: http://code.google.com/p/digg/wiki/Class

And Palantir has released a TypeScript Eclipse plugin, so I don't need to change IDEs to get developer tools: https://marketplace.eclipse.org/content/typescript

And D3.JS headers (provides type info for D3 from the outside): https://github.com/borisyankov/DefinitelyTyped

Sandbox: http://www.typescriptlang.org/Playground/#src=var%20x%3Anumber%3B%0Avar%20y%3ANumber%3B%0Ay%20%3D%20x%3B%20%2F%2F%20okay%20%0Ax%20%3D%20y%3B%20%2F%2F%20not%20okay

everbeek commented 10 years ago

Delightful! This is a two-way-door decision:

"...at the point you might decide to abandon TypeScript, you can begin using the .js files that the TypeScript compiler generated."

The same source says that JSLint applied to TypeScript output files indicates extremely clean code style. And just o reiterate, TypeScript is a strict superset of JavaScript, so we can write JS directly in the TypeScript files .ts). This seems like a no-brainer for an OO String Type like me.

everbeek commented 10 years ago

Did a bunch of conversion. That took about 2 days, but configuring and figuring out how to use require.js for dependencies was harder. Also figured out how to make abstract classes. I posted two stackoverflow answers based on this work:

http://stackoverflow.com/questions/13333489/declaring-abstact-method-in-typescript/22493343#22493343 http://stackoverflow.com/questions/20079464/whats-the-correct-way-to-use-requirejs-with-typescript

As mentioned in the second one, I couldn't find a complete answer to using classes and require.js. The approach used by codeBelt might still work, if I need to get rid of doubled-up external-module and class names when using them. I can look into that later. I will push my two commits, for this issue and #350, shortly.

everbeek commented 10 years ago

Fixing up bugs introduced by changes in 'this' scoping. More might exist. I am testing and inspecting. Currently, nodes are not getting sized, and the central node is missing, but I can drag nodes, the layout works, and tipsy popups display (with correct and complete information).

I have a problem with debugging though. I have to have the .ts files (the edited source) in the war directory. Ok, I know I was doing this before with the .js, so why worry now if I didn't then? Because I don't want the .ts being deployed. It's not sanitary to have twice as many files deployed, and I only want them for development anyway.

I googled about this issue in the Eclipse environment (VS is fine), and found no help. I am turning off source mapping and debugging the js output directly instead. This is unfortunate because I don't get .ts line numbers if I don't have mapping on, but if I have mapping on, I cannot debug the js directly in Chrome.

everbeek commented 10 years ago

Menu not working. Likely a 'this' issue. Fixing.

everbeek commented 10 years ago

Moving on to do concept graph refactoring into TypeScript. I will merge branches with Elena's work first, then continue with that.

everbeek commented 10 years ago

I needed to change the API key to match our new whitelisted one (to allow us to make more requests than 15 per second, since the visualizations can make as many as 50).

I also made some critical bug fixes to REST calls (preventing redundant calls, and commented out CORS code that will be used so we can receive status codes from the browser. I also fixed some more "this" scoping bugs. I was going to switch to using a library called JQuery JSONP, but decided against it for now. I have committed the header and lib file in case I change my mind back.

Status codes are not returned from browser to JS when cross domain calls are made. I did not know this. When the new REST service came out, it no longer injected error codes into JSON responses. This was why we had working error recovery before, because they used to inject those errors into the JSON. Now they do not, and the browser offers no errors. The best path is to get CORS working, so we can receive those errors from the browser.

Continue with getting Concept graph working again, then refactor redundancies across ontologies and concepts, then get onto CORS in a new ticket.

everbeek commented 10 years ago

Still getting Concept graph working again, and I have incrementally refactored redundancies across ontologies and concepts. Have to get concept mapping neighbourhood and term neighbourhood working.

everbeek commented 10 years ago

Loading each view directly with the appropriate URL parameter set (to term_neighborhood, for example) leads to proper loading. Using the dropdown doesn't. I suspect it is because the whitelists for REST calls and the edge registry are currently designed to represent the current visualization's state and to prevent redundancy with relation to that. When we change our visualization and intend to load from scratch, we either should re-initialize the whitelist, or have it return cached values. If the edge registry is involved, it should definitely be re-initialized.

Oh, my end of day note above is wrong. It can't be the registry. Confirming in debugger, but the code looks fine.

It is the Registry class, but not the edge registry. This is ok...if I implement a manual cache in it, I can get it to work as static, the way it is, while improving performance (at the cost of cached data). I could institute a limited cache if browser memory consumption is a concern. When cache items are cleared, their registry entries would need to be reset as well...thinking...

everbeek commented 10 years ago

I implemented caching in the Fetch Registry, and continued trying to get the concept graphs working. It turns out I was misusing the expansion registry: I had mixed semantics, because I was using the registry (properly) for the central nodes, but I was also using it to allow for other nodes to be fetched (and subsequently expanded sometimes).

So I changed usage to only use the expansion registry for the nodes that need other nodes added. This fixed some things (the term neighbourhood showed correct inclusion after this change). It left other things broke (changing concept graph types led to no results). This wasn't surprising, because I still hadn't added the Fetch Registry usage, so the fetcher was blockign the unregistered calls. I started adding that, and to my dismay, I found I had a bigger problem...

I feel like the url pre-registration system for URLs is silly...for the ontology view it works, and seemed to make sense, but for concepts it seems all wrong. The problem I solved for the ontology view was that multiple calls were made per URL, because there is indeterminancy and redundancy in revisitation of nodes when using the filter sliders. I needed to prevent these in order to enhance performance. But by using caching, I don't need to prevent the calls for reasons actual REST call dispatch. Since I now use caching, I can push the pre-registration stuff out, and put the redundancy checking into the fetch call rather than before it.

But...I still need to prevent them for dispatch of callbacks on the cached REST data though, since re-calling a callback that has already been done is silly at best and perhaps a source of bugs and a performance hit at worst. The callback and fetcher are pretty close together, so I could check the registry/cache when creating callbacks, and block at that level. The risk I can think fo right now is that if I remove nodes then need to re-add them, the caching would block re-call of the callback, and prevent the node from being re-added. Should I worry about that later?

Simplify via facts: 1) Caching for URLs is good. It allows us to re-query for data that will be used in either a redundant or totally new way (e.g. fetched data for relations, then later expanded to add node, use same data perhaps) 2) The exact same callback being called again is bad, in that it could do any number of operations that could cause glitches or outright bugs.

I feel stuck here...I will adjust things so that callbacks themselves are blocked by..OH!

Ok, I have it. If I always append registry entries with a parameter that represents the callback's class, then the registry actually knows which callbacks have been called on which URLs, and it can block any 2nd calls to the same callbacks, while also using cached data for new calls for the same URL. I realized this idea a while ago, but it didn't feel right. Now it is necessary I think.

So, I have to: 1) Rename Fetch Registry to Fetch Cache 2) Remove registration of URLs. They were a code smell the way I had them. 3) Move checks into the fetch call, and maybe into callback creation (doubtful). Use cache when available, but don't continue with fetch at all if the URL/callback pair is in the cache. 4) Leave concept expansion system as I currently have it in my working copy.

everbeek commented 10 years ago

I have the ontology view working again with the caching and anti-redundancy features pushed into the Fetcher class. I am going to commit for safety, and get back to working on the concept graph again.

everbeek commented 10 years ago

Ok, that was stupid of me. When we change views, I really wipe out tonnes of state, and then I set about trying to make sure we didn't make "redundant" calls, where all manifest state was destroyed. Furthermore, if there are any valid, "redundant" calls, they don't get dispatched so that the callback is indeed called when data is available, if the status is anything other than ALLOWED, COMPLETED, or ERROR. Meaning that AWAITING shuts down late coming callbacks, never to be fulfilled.

Clearly, I should have a mechanism for chaining callbacks that have requested the same data. On the other hand, I didn't think of this before because it isn't really an issue for these visualizations.

I either have to clear the cache completely when I change state (maybe), or allow plenty of redundant calls to hit the cache (at risk of problems I was trying to avoid when developing the filter). Thinking...

I'll let redundant hits, and if it doesn't work out, I will look into it further. I will also implement a callback chaining mechanism.

everbeek commented 10 years ago

Revamped cache to allow redundant calls, but to fulfill them with the cached data. Also am using a queue of callbacks whenever we have fetch calls made on the same URL while the request is pending. It looks cleaner, I think...but it's more functional this way.

It is slower though. Looking into it, and it appears that my concern above, that too many re-used callbacks would run and fulfilled with cached data, is happening. Confirming and figuring out a solution.

Confirmed. One solution is to require callbacks to have some sort of "related entity" identifier. This can be a node id, an edge id, or an arbitrary string (for when first loading path to root for example). Checking for the combination of that id and the callback name can tell us if it has been served before or not. We will then prevent redispatch to the callback if it has. When we change visualization types, we can clear this tracked data, which means all subsequent requests would be served from the cache. If we need to re-allow things, we would have to manually clear their entries as we deleted/reset/transmogrified whatever it is that will be needing that exact callback to be run again. Is this reasonable?

Just to be clear, for the ontology case, the filter as controlled by the sliders is responsible for this, since it re-calls fetch a bunch. This was by design, so that the unfetched nodes would get data fetched only when they are brought into view by the slider. I suppose I could invert this process, and have the sliders inspect to see who hasn't been fetched, and to only fetch those...but then they don't know if there are pending fetches, etc....

everbeek commented 10 years ago

I feel great about the current state of the REST call cache and callback service registry. The performance in the concept map views is great! The ontology view is also behaving very well, even when sliders are manipulated. On top of that, the views aren't tainted by the system at all really; it is all generic and at the fetch controller level. The view needs to provide some deterministic identifiers via the callback class, and to call a service record clear method when the context is switched and the view would like all callbacks to be serviced as though history had not occurred. The service registry cleanup does not clean up the cache, so we still benefit from that. The registry system could certainly be cleaned up a bit, but it isn't bad given Typescript ;)

The concept mapping view is still not rendering mapped nodes correctly. I think it's time to fix that.

everbeek commented 10 years ago

Have to get layouts behaving for switches between concept views. Otherwise this is wrapped up I think! I will merge this into other relevant branches if so, and close this branch's issue.

everbeek commented 10 years ago

Merged to Elena's branch and master, done with branch, but won't delete it yet. I should have just merged to master, then merged that to her branch; the commit graph looks stupid :/