Open nitrosx opened 2 years ago
I don't see a problem with this. The box says "Something went wrong" as the server replied with 404
, expected behaviour for "No results found" would be 200
and an empty array, right?
I can't seem to find anything wrong with the "provider filter" implementation either. You may inspect it yourself here.
I think the issue lays with the endpoint.
@nitrosx do you confirm @noobadmin analysis that the problem is with the endpoint i.e. federated search? Thanks for taking a look.
@noobadmin
When the user select the facility, It seems that the portal sends the query directly to the facility by-passing the federated search. Is that correct?
For the ESRF issue, the URL used to update the portal is the following:
https://icatplus.esrf.fr/api/documents?filter={"query":"diffraction","limit":50}
unfortunately, the ESRF API is case sensitive and, to work, you should use the following:
https://icatplus.esrf.fr/api/Documents?filter={"query":"diffraction","limit":50}
If you can fix the portal, to take care of this, it should work, although I'm skeptical about the approach to skip the federated search and go directly to the facility backend from the portal (I can be convinced otherwise)
@andygotz I think we need to do a brainstorm first to define the scope of this functionality. The way I had envisioned was that the facility selector was acting on the results set returned to the data portal and not asking the data portal to contact the specific facility. Hopefully my explanation makes sense.
Max thanks for the feedback. A quick fix is already a start. Yes a brainstorm sounds like a good idea. I thought the portal was filtering on the results it already had retrieved. But the approach of Jiri sounds sound i.e. ask the facility endpoint.
Jiri can you try the quick fix? Thanks!
Hi @nitrosx
Apart of the fact of by-passing the federated search API that I have no strong opinion yet, if we want this to work smoothly I would suggest to harmonize the endpoints. If I am not mistaken, the error 404
sounds good to me because we are not implementing such endpoint.
Currently, the portal sends the requests to the federated search API with lower case, example:
/documents
Later, and for some reason I do not know, the federated search API forward the request to the local implementations with capital letter:
/Documents
In the documentation there is a mix of upper and lower case that makes it confusing: https://github.com/panosc-eu/search-api/blob/master/doc/api-calls.md#call-5
In general, the recommendation is that URLs should be case-sensitive (https://www.w3.org/TR/WD-html40-970708/htmlweb.html) so I would harmonize this and I would not expect everybody to be case insensitive, it is probably unnecessary and prone to errors.
Just my opinion!
@antolinos I agree with you we should harmonize the URLs. Do you think we should go all lower case or first letter uppercase?
I think that the federated search api should forward the request to the local implementations without altering it. So, it is the responsability of the portal to use the right case.
So, from the federated search It means that if /documents
in then /documents
out.
Lower or upper case? I use to develop endpoints in lower case but as far as same it is used everywhere in the software chain I will be happy.
I prefer lowercase too. Uppercase begs the question then of camel case - should we capitalise words or not? Underscores would be preferred to camel case then. My 2 cents.
@antolinos answer is nicely echoed in this stack overflow entry:
If you can fix the portal, to take care of this, it should work, although I'm skeptical about the approach to skip the federated search and go directly to the facility backend from the portal (I can be convinced otherwise)
@nitrosx Is there another way? I've actually naively tried to implement it by filtering on provider
key first but without success. I've figured this is the implementation you preferred since the queries didn't work and I had no other pointers.
A nice feature might be to send as parameter a list of providers (optional). The providers will be taken into account by the federated search. I think it is fairly easy to implement.
Example:
/api/documents&filter=...&provider=ESS,ESRF,ILL
This would fix the issue and will allow in a future to filter not only by single facilities as today but by type of facility, i.e: photons, neutrons, xfel etc...
The user interface might propose to select by neutrons then:
/api/documents&filter=...&provider=ESS,ILL
or photons:
/api/documents&filter=...&provider=ESRF,SOLEIL,DLS,HZB,PSI
This might add value to the search and Max will be happier :D
Just an idea...
A nice feature might be to send as parameter a list of providers (optional). The providers will be taken into account by the federated search. I think it is fairly easy to implement.
Example:
/api/documents&filter=...&provider=ESS,ESRF,ILL
This would fix the issue and will allow in a future to filter not only by single facilities as today but by type of facility, i.e: photons, neutrons, xfel etc...
The user interface might propose to select by neutrons then:
/api/documents&filter=...&provider=ESS,ILL
or photons:
/api/documents&filter=...&provider=ESRF,SOLEIL,DLS,HZB,PSI
This might add value to the search and Max will be happier :D
Just an idea...
I have this implementation ready to go already but I don't think there's support for this at the federated endpoint. I've tried with following queries:
{
"where": {
"provider": "https://search.panosc.ess.eu/api"
},
"query": "lung",
"limit": 50
}
{
"provider": "https://search.panosc.ess.eu/api",
"query": "lung",
"limit": 50
}
@andygotz I've fixed the casing, should work now but there are still issues with some endpoints...
@noobadmin Which ones?
And thanks for the work!
PSI and ESRF fail to respond to queries with (nested) filters. I undertand they aren't implemented at those sites...
@noobadmin Could you please provide a couple of examples? Thanks
For the nested filters... has any facility with real data be able to make it to work (by returning some useful results)?
I am saying this because mapping the techniques ( with the given ontology) and fields like formula or temperature (whatever temperature means..) is far to be straight-forward.
If nobody is returning nothing coherent I would suggest to remove the nested filters.
@noobadmin thanks a lot for fixing the case, filtering on ESRF now works nicely!
@nitrosx there are still a number of facilities with errors and/or returning inconsistent results e.g. searching on 'covid-19' still has MAX-IV, ESS, XFEL returning results which have no relation with covid-19?
@andygotz long and complicated story short, to the eyes of the scoring service, searching for covid-19 is the almost the same as searching for covid nineteen. That's why you see results which apparently are not relevant to covid-19. They are relevant to the word nineteen which should be in the scoring info. This is one of the biases that are introduced by the technique that I used for the scoring. I've been thinking about how to correct the bias, but I have not found a robust solution yet. Interesting enough, if you search just for covid, you will exclude all the unwanted results.
There might be enough material for a paper there
@nitrosx I see. Do you mean the number '19' is searched for or the text 'nineteen' - I presume the number because I cannot see any text nineteen in the entries. A simple solution is to not allow searching for numbers but only for literal text i.e. covid-19 is treated as a single text. If this is not possible (why?) then remove the numbers from the text. Not being able to search for numbers e.g. "1", seems better to me rather than have results which are meaningless. Users could always enter numbers in the parameters if they need to.
In the PaNOSC scoring, all the number are translated to their word representation and to allow to be searched for. It is possible to have some sequence of words treated as single word and go unchanged through the ingestion that the scoring perform, but it needs to be set-up correctly. It is possible to remove numbers from the information ingested by the scoring, but that would required a change at all the facilities that have adopted the PaNOSC scoring.
I still don't understand where 'nineteen' is in the text of the results? If the search text changes numbers to words what about the text that is being search? Would it match the nineteen in 12345678919? If we drop treating numbers separately could we still keep the option of searching for text with numbers in it e.g. ID19 should still be searchable for.
I understand this is a change for all sites deploying the scoring. What is the work for each site? Is it limited to updating the docker image (are all sites using this)? Or does it involve more work?
How much work is it for you to update the scoring algorithm?
Most likely the number is contained in the portion of information that is not shared with PaNOSC. The query goes through the same process (at least for now), so the numbers in the query are converted in their word counterpart and than the relevancy score is computed. In addition, all the punctuation (including - and _ ) are converted to spaces. That is, at least, what it was suggested in the few tutorial that I used to build the scoring. If you have a word that contains number, that word is not changed and the numbers are maintained as they are. For example:
My best guess is that it will take me 2/3 days to update the scoring, test it and create the new docker image. I'm not sure how long it will take to discuss and agree on the needed changes.
Regarding the facilities, once the docker image has been updated, they would need to deploy the new image, update the scoring information and trigger the weight computation.
If you can fix the portal, to take care of this, it should work, although I'm skeptical about the approach to skip the federated search and go directly to the facility backend from the portal (I can be convinced otherwise)
@nitrosx Any updates on this? I'd be happy to implent the provider filter through federated endpoint if you show me the required query.
As user, I'm refining a search by selecting "European Syncrothron Radiation Facility" in the facility dropdown, I see "Something went wrong" message in the results panel. See screenshot below.
I would expect to see "No results available" message if no results are available or the facility results.
The browser console logs the following error: