plantinformatics / pretzel

Javascript full-stack framework for Big Data visualisation and analysis
GNU General Public License v3.0
43 stars 13 forks source link

Defining API for alias requests to back end #40

Open gabrielkg opened 6 years ago

gabrielkg commented 6 years ago

With the new design, aliases are stored in the back end distinct from the feature definitions, and back end will eventually serve the required information to the front-end for drawing lines between axes.

What functions are needed on the back-end?

An initial sketch:

INPUT: a pair of blocks (dataset + scope) OUTPUT: list of pairs of features to draw paths between, together with the alias that joins them

Some questions:

Gruek commented 6 years ago

We should also decide how to format the returned paths, do we just want the feature ids that are joined? or the ranges? Would be good to have a format abstract enough so we can return synteny blocks in the same way

gabrielkg commented 6 years ago

Yes - good point about the synteny blocks.

If the back-end calculates synteny blocks, it just needs to return the aliases together with "synteny block/group" so another field which tells you which set it is in... OR, return a collection of sets of aliases. Either way would work. The first seems more general because you could put anything in the extra field.

So do we return pairs of features, the alias that connects them, plus an extra field of key-value pairs where things like coverage, identity, synteny_block, etc can be passed?

Don-Isdale commented 6 years ago

yes, that last option for the synteny blocks sounds good.

re. stacked axes - an axis may now display multiple blocks, so I am thinking the API should be for adjacency between a pair of blocks (not axes or stacks).

There is a function which calculates the cross-product (triangle) of 2 stacks, i.e. all the pairs of adjacent axes in the 2 stacks (pretty simple). That could be used on the BE; in that case the api should identify which adjacencies have already been requested - don't want to repeat. I think the FE can work that out, and request any adjacencies which are exposed by a stack movement and it does not yet have data for.

If an axis can combine blocks with different namespaces, e.g. myGenome 90k and myGenome:myAnnotation, then for each axis in the api args, there could be multiple namespaces.

gabrielkg commented 6 years ago

Good point about namespaces. So do we make a separate request for each pair of namespaces in the adjacencies? Could blow out quite quickly...

Or: is the input two lists which define exactly what is in each adjacent axis (blocks and namespaces)? This is probably optimal because the BE can deal with getting the data and working out from there.

But note that a block is really defining a namespace so a list of "left" and "right" blocks should be enough?

Don-Isdale commented 6 years ago

yes, either blocks[2][], or block[2] The latter will have more progressive result; seems equal complexity or slightly simpler. Quick discussion with Gabriel - he's happy with the latter, so OK for you to start Alex. so the api args are just : 2 block ids. (namespace, scope etc are implied by the block ids) The requests and responses should be asynchronous (not required in first version), so we have a nice steady flow of data being displayed.

Gruek commented 6 years ago

So what format do we want the paths to be returned?

{'featureA': '1231232313', 'featureB': 'abcabcabacacbab', 'alias': null} ?

The problem with this format is that synteny blocks can't be expressed using 2 features. We could use 4 positions instead of 2 features but then we'd be losing the context of the path (alias and features)

Don-Isdale commented 6 years ago

The blocks in the request imply various fields, which then don't need to be included in the result : parent, namespace, scope, featureType, name The result can be an array of (either a hash or array) : featureNames[2] alias: (optional field) this hash, and each of its fields are optional : { coverage : ... , identity : ... , synteny_block : ... } I'm not clear on the qualifiers on the feature names in the example : 'featureA': '1231232313', 'featureB': 'abcabcabacacbab', I guess those are the namespace; I'd be inclined to factor that out, or omit it because it is implied.

So a synteny block could be [ ... , [ 'featureA', 'featureB', "aliasName", { synteny_block : "SB1" } ], [ 'featureC', 'featureD', "", { synteny_block : "SB1" } ], ... ] I've omitted alias for 'featureC', 'featureD', - that is a question for Gabriel - can a synteny block to contain direct and alias connections ? Also, I guess the 2 edges of a synteny block can have alias fields with different values.

Possibly the feature name qualifiers in the example '1231232313', 'abcabcabacacbab', are the locations of those features; I can see the value of including feature location or range, because of the work flow : FE requests and displays synteny blocks, then FE requests markers which make up that synteny block; so the FE doesn't have the feature locations when it receives the synteny block result. Does this address your main question re. features or positions to define the edges of synteny blocks ? so the format could be : [ ... , { 'featureA' : [100, 120], 'featureB' : [150, 170 ], alias : "aliasName", extra : { synteny_block : "SB1" } }, { 'featureC' : [240, 250], 'featureD' : [290, 300 ], extra: { synteny_block : "SB1" } }, ... ]

Gruek commented 6 years ago

those were meant to be fake ids

Don-Isdale commented 6 years ago

ok, so my examples should have the object ids also I guess ... is that because feature name is not unique ? I understand it is unique within its namespace .. ?

gabrielkg commented 6 years ago

The problem with this format is that synteny blocks can't be expressed using 2 features.

This is true, if representing synteny blocks as the four extreme points of the block.

The alternate way is to return the synteny group of the features (as in Don's example) which then get coloured or processed accordingly. This way is more general and allows tagging of the returned features/aliases with a class that can represent more than just synteny block. Hence I think this way is preferable. Would something like:

{'featureA': '1231232313', 'featureB': 'abcabcabacacbab', 'alias': 'absdsa5asdabvsd', 'class': null}

work?

Gruek commented 6 years ago

That works, but if the point of the synteny blocks was to reduce the amount of data that was sent to FE then this doesn't solve that since we would send each path of each block. However, It will help reduce the visual clutter if we can group the paths together. So we can do this to begin with until we find the best way to do this?

Don-Isdale commented 6 years ago

Just the edge paths of the synteny blocks would be in the response, i.e. just the features defining those 2 paths. The locations/ranges of those features would be included so that the FE would have the information required to draw the synteny blocks.

Gruek commented 6 years ago

I think that if we're going to use an extra field to tell the FE that the two edge paths are a synteny block, then we may as well just have a type field and then instead of 2 features, we return 4 positions for synteny blocks.

Also how do we then request the paths inside the blocks? Do we need to specify some sort of "depth" in the request?

Don-Isdale commented 6 years ago

Yes, I was talking about requesting the synteny blocks between 2 blocks (axes) (based on direct and/or aliased paths), not the api for alias connections, which would return all paths, and can include synteny block (parent) for each path. (so not dissenting from Gabriels comment above, talking about a different API)

have a type field and then instead of 2 features, we return 4 positions for synteny blocks.

By 4 positions do you mean the ranges of the 2 features, each range being 2 positions ? Otherwise I'm not clear yet why we would need positions instead of feature names.

Also how do we then request the paths inside the blocks? Do we need to specify some sort of "depth" in the request?

Yes, I think it could be a single API with a depth parameter (top-level == synteny blocks | next level === paths)

Gruek commented 6 years ago

I was under the impression that we want a single API request for paths / synteny blocks which includes aliased paths. Are they different requests?

The 4 positions I'm talking about are the 4 corners of the synteny block. We can also define the synteny block as the 4 corner features, but I think that those 4 features aren't exactly any more important (apart from their position) than all the other potentially 1000s of features and paths that are encompassed by the synteny block

Gruek commented 6 years ago

Also in relation to the depth parameter, we have different levels of features, so maybe depth should be an integer, 0 being synteny blocks of top level features, 1 being the paths between top level features, 2 being paths between 2nd level features, etc... and -1 being paths between all levels of features. Or maybe synteny blocks should be a different parameter to the level of features that we want to draw paths between?

Don-Isdale commented 6 years ago

either single request with a level parameter or 2 requests; if there is a significant difference between the result types then that might be a case for the latter. The result types would be all the same they are simply paths, which may have an SB (name) attribute. That is why I tend to think that 2 is preferable to 4.

the synteny block display is currently defined in terms of the corner features; can change to locations or support either. as I understand the corners of the SB are features, which have locations (ranges).

yes, can do multiple levels, not essential in the first version, but good to have later on.

OK to see synteny blocks as level 0; I take your point about separating that param from the feature-level, that would imply a synteny block calculation would would look only at the specified levels - may be interesting, but not something to put extra effort into in the short term.

gabrielkg commented 6 years ago

Synteny blocks are a higher-level feature than aliases, and I feel should be treated separately. Synteny blocks are a function of some set of aliases.

Main reason is that the synteny blocks are a nice feature but not core, while aliases is core functionality. Alias via reference (I'll create another issue for that) is a more important feature than synteny blocks for now.

We can put synteny blocks on hold because we can still highlight synteny blocks by calculating outside and colouring the paths accordingly.

Gruek commented 6 years ago

I've added a route for finding the paths between two blocks either directly or via alias.

Example usage: curl -X GET --header 'Accept: application/json' 'http://localhost:5000/api /Blocks/paths?blockA={blockA_Id}&blockB={blockB_Id}&access_token={accessToken}'

returns: [{ "featureA": "{featureA_Id}", "featureB": "{featureB_Id}", "alias": {...} }]