panosc-eu / search-api

PaN search api for WP3 and WP4
BSD 2-Clause "Simplified" License
4 stars 4 forks source link

Change relation from dataset to instrument to many-to-many. #32

Closed RKrahl closed 4 years ago

RKrahl commented 4 years ago

As pointed out in last week's meeting, each measurement at BESSY II involves at least two instruments, a beamline and an experimental station. We are not able to adequately represent our data if a dataset is restricted to have only one instrument.

garethcmurphy commented 4 years ago

I have checked the minutes and see Decision: One Instrument per dataset

RKrahl commented 4 years ago

Yes, i know. But this does not change the fact that restricting the number of instruments per dataset to one will simply not work for HZB.

I am somewhat allenated by the bold refusal to accommodate the requirements of individual facilities. In particular because I didn't hear one single argument until now why this would be a problem. I have to remind you to your comment in #20:

Dataset to Instrument - this could be made one-to-many

garethcmurphy commented 4 years ago

Hi Rolf,

Thanks for your comment. I appreciate your help and regret your alienation. My comment in #20 has been rendered moot by the joint meeting. I am a bit unclear on the use case, does the user need to distinguish between a dataset recorded at beamline X and experimental station Y and one recorded at beamline X and experimental station Z? Could this be better addressed as a Dataset.parameter?

RKrahl commented 4 years ago

Let me illustrate the situation by practical examples: we have the ALICE experimental station that may be used at six different beamlines, UE112_PGM-1, UE56-2_PGM-2, UE56-1_PGM, UE52_SGM, U49-2_PGM-1, and PM3. Then we have the LiXEdrom station that may be attached to the UE56-2_PGM-2, UE52_SGM, and U49-2_PGM-1 beamline. Furthermore, there is the RGBL-PEEM station that can be used at UE112_PGM-1, UE56-2_PGM-2, UE52_SGM, U49-2_PGM-1, and PM3. Just to name three examples. That means, we will have datasets created with all possible combinations:

For these three stations alone, we have 14 different combinations.

It is a very common approach for users searching a particular dataset to start this search by the instrument having created it. So some users searching for a dataset having been created with ALICE at PM3 will search for datasets linked to the ALICE station. Other users searching for the very same dataset will search for datasets linked to the PM3 beamline. Both users have the perfectly legitimate expectation to find their dataset this way. But if I can link the dataset to one instrument only, half of these users will get zero results on a perfectly valid query, although the desired dataset exists and does match the search criteria. This is BAD.

And no, dataset parameter will not help anything in this case. No user will ever get the weird idea that they should search by the instrument name in some parameter value and not by instrument if they want to search for a dataset by instrument.

zjttoefs commented 4 years ago

I understand your use case, Rolf. But opening this to a many-to-many relationship, will lead to different interpretations of the granularity what constitutes an instrument. When we had the discussion in Lund people were immediately suggesting to record all detectors and sample environment equipment separately. If we have no consistent way of presenting the search results that will hurt the user experience. No one is stopping people from going to the origin of the data for more detail than we the common API carries, though.