Harmonise approach to dataset specification

natverse / neuprintr

R client utilities for interacting with the neuPrint connectome analysis service

http://natverse.org/neuprintr

3 stars 3 forks source link

Harmonise approach to dataset specification #25

Closed jefferis closed 4 years ago

jefferis commented 4 years ago

@alexanderbates just made the following change

https://github.com/natverse/neuprintr/commit/2b912ceb1608d93772c5414ba061b33d0e2d7edb#r36926516

which adds a dataset parameter to neuprint_fetch_custom. This sounds like it may be a good idea, but I'm slightly concerned that it could cause issues for a large number of functions that now use check_dataset authored by me or @romainFr. In particular there are almost no calls to neuprint_fetch_custom() that currently pass on the dataset parameter.

jefferis commented 4 years ago

Two questions @alexanderbates @romainFr

Is dataset = "hemibrain:1.0" the right default for neuprint.janelia.org and the test server?
Is it enough to add the dataset like so:
```
Payload <- sprintf('{"cypher":"%s","dataset":"%s"}', cypher, dataset)
```
(which would be much simpler) or are there times where we still need the dataset prefix e.g. as used here:

https://github.com/natverse/neuprintr/blob/cffb1d0babdf860bc74d620ce56b618c306da7bb/R/name.R#L35-L49

romainFr commented 4 years ago

hemibrain:v1.0 is the current release on neuprint.janelia.org. It is not the version we're using internally for now. Generally I think specifying it in .Renviron is really the way to go as more versions get released and people want to switch back and forth easily
dropping the dataset prefix would be great. It works on neuprint-explorer, not sure it would work for us but I'll give it a try and let you know.

romainFr commented 4 years ago

Dropping the prefix seems to be working, but somehow it makes some queries slower (especially when probing :Segments and not :Neurons). I just pushed it to the drop_prefix branch in case you think that's worth a PR.

Should we also change the default for all_segments to FALSE in most cases? It slows down queries and in most cases people really only care about full neurons (except when counting total number of synapses)

alexanderbates commented 4 years ago

Hello Romain. Sorry I have not been very communicative before! And sorry if I broke a few things, I'm remembering how this all works and I wanted to have the package working for https://neuprint-test.janelia.org/, so people can start using it!

Yes, I think we should drop most cases of all_segments to FALSE.

Your branch looks good, but when I try to test I just get HTTP 400 errors with https://neuprint-test.janelia.org/. Before, I had used neuprint_find_neurons to query api/npexplorer/findneurons, and neuprint_bodies_in_ROI for cypher queries. Do we just want to have one function there then.

I think we should move to just specifying the dataset for neuprint_fetch_custom (payload sent to api/custom/custom), rather than in the cyphers themselves. Will that work for you?

romainFr commented 4 years ago

Sorry I hadn't merged master at first in the branch, the changes in neuprint_find_neurons were something else I was exploring (in practice we're mostly using find_neurons). If you pull now it should be a more reasonable branch (with just the prefixed queries changed).

With what functions do you get 400 errors?

Finally yes, I'm all for removing the dataset from the cyphers. At least we can start a PR and investigate the query speed.

jefferis commented 4 years ago

@romainFr how confident are you about these speed differences? It's clear that a heavier query could results in more 400 type errors and I suspect that they might be quite common with serious analysis.

romainFr commented 4 years ago

It seems that I cannot reproduce it now -- I probably got mixed up with the updates to master...

jefferis commented 4 years ago

@romainFr OK so it sounds like we can probably push ahead with your drop_prefix branch. Do you have much more to do on that? It should actually simplify a bunch of code.

romainFr commented 4 years ago

I think I removed all prefixes yes, so we're good to go on and merge

jefferis commented 4 years ago

Just a note before we close this issue that although #26 is merged, there are still spurious instances of check_dataset lurking. It should now be possible for pretty much all of these to be eliminated leaving neuprint_fetch_custom to do the work.

jefferis commented 4 years ago

This should be completed by #29