Open gabrielkg opened 6 years ago
We should also decide how to format the returned paths, do we just want the feature ids that are joined? or the ranges? Would be good to have a format abstract enough so we can return synteny blocks in the same way
Yes - good point about the synteny blocks.
If the back-end calculates synteny blocks, it just needs to return the aliases together with "synteny block/group" so another field which tells you which set it is in... OR, return a collection of sets of aliases. Either way would work. The first seems more general because you could put anything in the extra field.
So do we return pairs of features, the alias that connects them, plus an extra field of key-value pairs where things like coverage, identity, synteny_block, etc can be passed?
yes, that last option for the synteny blocks sounds good.
re. stacked axes - an axis may now display multiple blocks, so I am thinking the API should be for adjacency between a pair of blocks (not axes or stacks).
There is a function which calculates the cross-product (triangle) of 2 stacks, i.e. all the pairs of adjacent axes in the 2 stacks (pretty simple). That could be used on the BE; in that case the api should identify which adjacencies have already been requested - don't want to repeat. I think the FE can work that out, and request any adjacencies which are exposed by a stack movement and it does not yet have data for.
If an axis can combine blocks with different namespaces, e.g. myGenome 90k and myGenome:myAnnotation, then for each axis in the api args, there could be multiple namespaces.
Good point about namespaces. So do we make a separate request for each pair of namespaces in the adjacencies? Could blow out quite quickly...
Or: is the input two lists which define exactly what is in each adjacent axis (blocks and namespaces)? This is probably optimal because the BE can deal with getting the data and working out from there.
But note that a block is really defining a namespace so a list of "left" and "right" blocks should be enough?
yes, either blocks[2][], or block[2] The latter will have more progressive result; seems equal complexity or slightly simpler. Quick discussion with Gabriel - he's happy with the latter, so OK for you to start Alex. so the api args are just : 2 block ids. (namespace, scope etc are implied by the block ids) The requests and responses should be asynchronous (not required in first version), so we have a nice steady flow of data being displayed.
So what format do we want the paths to be returned?
{'featureA': '1231232313', 'featureB': 'abcabcabacacbab', 'alias': null} ?
The problem with this format is that synteny blocks can't be expressed using 2 features. We could use 4 positions instead of 2 features but then we'd be losing the context of the path (alias and features)
The blocks in the request imply various fields, which then don't need to be included in the result : parent, namespace, scope, featureType, name The result can be an array of (either a hash or array) : featureNames[2] alias: (optional field) this hash, and each of its fields are optional : { coverage : ... , identity : ... , synteny_block : ... } I'm not clear on the qualifiers on the feature names in the example : 'featureA': '1231232313', 'featureB': 'abcabcabacacbab', I guess those are the namespace; I'd be inclined to factor that out, or omit it because it is implied.
So a synteny block could be [ ... , [ 'featureA', 'featureB', "aliasName", { synteny_block : "SB1" } ], [ 'featureC', 'featureD', "", { synteny_block : "SB1" } ], ... ] I've omitted alias for 'featureC', 'featureD', - that is a question for Gabriel - can a synteny block to contain direct and alias connections ? Also, I guess the 2 edges of a synteny block can have alias fields with different values.
Possibly the feature name qualifiers in the example '1231232313', 'abcabcabacacbab', are the locations of those features; I can see the value of including feature location or range, because of the work flow : FE requests and displays synteny blocks, then FE requests markers which make up that synteny block; so the FE doesn't have the feature locations when it receives the synteny block result. Does this address your main question re. features or positions to define the edges of synteny blocks ? so the format could be : [ ... , { 'featureA' : [100, 120], 'featureB' : [150, 170 ], alias : "aliasName", extra : { synteny_block : "SB1" } }, { 'featureC' : [240, 250], 'featureD' : [290, 300 ], extra: { synteny_block : "SB1" } }, ... ]
those were meant to be fake ids
ok, so my examples should have the object ids also I guess ... is that because feature name is not unique ? I understand it is unique within its namespace .. ?
The problem with this format is that synteny blocks can't be expressed using 2 features.
This is true, if representing synteny blocks as the four extreme points of the block.
The alternate way is to return the synteny group of the features (as in Don's example) which then get coloured or processed accordingly. This way is more general and allows tagging of the returned features/aliases with a class that can represent more than just synteny block. Hence I think this way is preferable. Would something like:
{'featureA': '1231232313', 'featureB': 'abcabcabacacbab', 'alias': 'absdsa5asdabvsd', 'class': null}
work?
That works, but if the point of the synteny blocks was to reduce the amount of data that was sent to FE then this doesn't solve that since we would send each path of each block. However, It will help reduce the visual clutter if we can group the paths together. So we can do this to begin with until we find the best way to do this?
Just the edge paths of the synteny blocks would be in the response, i.e. just the features defining those 2 paths. The locations/ranges of those features would be included so that the FE would have the information required to draw the synteny blocks.
I think that if we're going to use an extra field to tell the FE that the two edge paths are a synteny block, then we may as well just have a type field and then instead of 2 features, we return 4 positions for synteny blocks.
Also how do we then request the paths inside the blocks? Do we need to specify some sort of "depth" in the request?
Yes, I was talking about requesting the synteny blocks between 2 blocks (axes) (based on direct and/or aliased paths), not the api for alias connections, which would return all paths, and can include synteny block (parent) for each path. (so not dissenting from Gabriels comment above, talking about a different API)
have a type field and then instead of 2 features, we return 4 positions for synteny blocks.
By 4 positions do you mean the ranges of the 2 features, each range being 2 positions ? Otherwise I'm not clear yet why we would need positions instead of feature names.
Also how do we then request the paths inside the blocks? Do we need to specify some sort of "depth" in the request?
Yes, I think it could be a single API with a depth parameter (top-level == synteny blocks | next level === paths)
I was under the impression that we want a single API request for paths / synteny blocks which includes aliased paths. Are they different requests?
The 4 positions I'm talking about are the 4 corners of the synteny block. We can also define the synteny block as the 4 corner features, but I think that those 4 features aren't exactly any more important (apart from their position) than all the other potentially 1000s of features and paths that are encompassed by the synteny block
Also in relation to the depth parameter, we have different levels of features, so maybe depth should be an integer, 0 being synteny blocks of top level features, 1 being the paths between top level features, 2 being paths between 2nd level features, etc... and -1 being paths between all levels of features. Or maybe synteny blocks should be a different parameter to the level of features that we want to draw paths between?
either single request with a level parameter or 2 requests; if there is a significant difference between the result types then that might be a case for the latter. The result types would be all the same they are simply paths, which may have an SB (name) attribute. That is why I tend to think that 2 is preferable to 4.
the synteny block display is currently defined in terms of the corner features; can change to locations or support either. as I understand the corners of the SB are features, which have locations (ranges).
yes, can do multiple levels, not essential in the first version, but good to have later on.
OK to see synteny blocks as level 0; I take your point about separating that param from the feature-level, that would imply a synteny block calculation would would look only at the specified levels - may be interesting, but not something to put extra effort into in the short term.
Synteny blocks are a higher-level feature than aliases, and I feel should be treated separately. Synteny blocks are a function of some set of aliases.
Main reason is that the synteny blocks are a nice feature but not core, while aliases is core functionality. Alias via reference (I'll create another issue for that) is a more important feature than synteny blocks for now.
We can put synteny blocks on hold because we can still highlight synteny blocks by calculating outside and colouring the paths accordingly.
I've added a route for finding the paths between two blocks either directly or via alias.
Example usage: curl -X GET --header 'Accept: application/json' 'http://localhost:5000/api /Blocks/paths?blockA={blockA_Id}&blockB={blockB_Id}&access_token={accessToken}'
returns: [{ "featureA": "{featureA_Id}", "featureB": "{featureB_Id}", "alias": {...} }]
With the new design, aliases are stored in the back end distinct from the feature definitions, and back end will eventually serve the required information to the front-end for drawing lines between axes.
What functions are needed on the back-end?
An initial sketch:
INPUT: a pair of blocks (dataset + scope) OUTPUT: list of pairs of features to draw paths between, together with the alias that joins them
Some questions: