ispyb / ispyb-database-modeling

4 stars 3 forks source link

New "Notifications" table #3

Closed olofsvensson closed 5 years ago

olofsvensson commented 7 years ago

Today we use at the ESRF the DataCollection.comments and DataCollectionGroup.comments fields for adding notifications from the automatic pipelines (workflows and auto-processing). There are many disadvantages with this approach:

I therefore suggest that we add a new table called "Notifications" with the following columns:

The idea is that the corresponding web service will take the following input:

The web service will then be responsible to fill all other "Id" columns, for example in the case of autoProcIntegrationId the sessionId, sampleId, dataCollectionGroupId and dataCollectionId will be automatically filled.

This new table will make it very easy to get a list of all notifications at different levels: session, sample, data collection group and individual tasks like auto-processing, and also for different levels of severeness.

KarlLevik commented 7 years ago

There are a lot of foreign keys (implicitly) proposed here. My first thought is that it might not be the optimal schema design.

Also, we have a DataCollectionComment table which could be extended to capture this information, although it was intended for user comments.

antolinos commented 7 years ago

Hi @KarlLevik ,

The idea of this table is to centralize all notifications. For example, it allows to check all the errors produced by data analysis software on a session, sample or data collection. It is similar to what Stu proposes here but by using the same table for all notifications. https://github.com/ispyb/ispyb-database-modeling/issues/10

Let's say it is a shortcut to get an overview about what is happening, the idea is not to retrieve information quickly but exhaustively.

stufisher commented 7 years ago

Would be interested to know exactly what type of notifications you add at the moment? Do you have some examples?

I agree with Karl, there are many foreign keys here that are implicit of other foreign keys. sessionid, sampleid, datacollectiongroupid are all implicit of datacollectionid, energyscanId, ... and should therefore be removed. You do not need workflowid and wokflowstepid, the step must implicate a workflow.

Robotaction already reports errors (that is one of its primary uses), maybe doesnt belong in here AutoProcStatus already tells us about integration steps and their outcomes (if we populated them, DLS do not atm, maybe ESRF do?) MXMRRun has a status column and an associated comment (dimple). PhasingProgramRun has status and comments columns ScreeningOutput has status flags and associated comments.

We should avoid duplicating existing reporting. These are just statuses and messages, maybe you are making more detailed reporting than that of course (hence asking for examples).

We have lots of error reporting distributed about and writing the right query could bring these together.

antolinos commented 7 years ago

Hi @stufisher,

This scenario you described and your issue https://github.com/ ispyb/ispyb-database-modeling/issues/10 might explain why we would like to propose a table like that.

Just to clarify, a notification does not imply a change in the status and we might have many notification for each "entity". If beam is lost during a data collection we might want to store a notification linked to a sessionId and dataCollectionId. If a pipeline has solved an structure we might want to store a notification linked to a phaginProgramRunId, dataCollectionId and sessionId. @olofsvensson can tell us more use cases.

One of the advantages is, as you said for AutoProcProgramAnalys, is to be able to populate many rows for each analysis result from any analysis program (but not only from them). I can imagine this to happen also to PhasingProgramRun, ScreeningOutput, workflows, data collection, etc... Today, tables like AutoProcStatus, PhasingProgramRun, ScreeningOutput, (or session, data collection, workflow, etc...), etc... don't not allow to do that as there is only room for 1 status and 1 comments.

If we follow what you proposed for AutoProcProgramAnalysis, should we also create a similar table for: data collection, session, workflow, workflowstep, etc.. ? It'd mean a new table for each entity that can potentially send more than 1 notification.

Second, as you said, there are many of errors reporting distributed for many tables making difficult (use of complex queries) to have a nice picture about what happen in a data collection, session or proposal. However, having all this centralized in the same table could help to make things easier.

Once said all this, I also think that having many foreign keys is not be a good idea in general however the potential benefit of this may be worth giving a try....

2017-07-02 14:48 GMT+02:00 Stu Fisher notifications@github.com:

Would be interested to know exactly what type of notifications you add at the moment? Do you have some examples?

I agree with Karl, there are many foreign keys here that are implicit of other foreign keys. sessionid, sampleid, datacollectiongroupid are all implicit of datacollectionid, energyscanId, ... and should therefore be removed. You do not need workflowid and wokflowstepid, the step must implicate a workflow.

Robotaction already reports errors (that is one of its primary uses), maybe doesnt belong in here AutoProcStatus already tells us about integration steps and their outcomes (if we populated them, DLS do not atm, maybe ESRF do?) MXMRRun has a status column and an associated comment (dimple). PhasingProgramRun has status and comments columns ScreeningOutput has status flags and associated comments.

We should avoid duplicating existing reporting. These are just statuses and messages, maybe you are making more detailed reporting than that of course (hence asking for examples).

We have lots of error reporting distributed about and writing the right query could bring these together.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ispyb/ispyb-database-modeling/issues/3#issuecomment-312489927, or mute the thread https://github.com/notifications/unsubscribe-auth/ACxEsTVpBOay6f4JiT2VJoi9gDu6juzXks5sJ5GogaJpZM4MZNRd .

stufisher commented 7 years ago

A central table sounds sensible, though it is still a complex query to retrieve what this table has to say.

You should probably remove sessionid, blsampleid, datacollectiongroupid, and workflowid. These are implicit and by having them in this table you are duplicating data (there should be only one instance of this information in the db). Instead you should join to BlSession, BLSample, ... . Im not actually sure dataCollectionId should be in here. What are you going to report regards the datacollection that is not mopped up by the autoproc tables? I think maybe we need to have a chat about this?

Ahh and the obvious missing column in my opinion is a timestamp, id like to know when these happened.

So to summarise, if we are to merge into #10 and be complete:

notificationId int pk (auto-incremented)
level enum ('info', 'warning', 'error')
origin varchar(100), (for example "Dimple") <-- not sure this is needed, it should be implied by the foreign key used)
summary varchar(255)
message text
bltimestamp timestamp
energyScanId fk
xfeFluorescenceSpectrumId fk
autoProcIntegrationId or autoProcProgamId fk (not sure which is most sensible)
phasingStepId fk
screeningId fk
mxmrrunid fk

robotActionId fk <-- stil not sure about this
dataCollectionId fk (maybe remove)
workflowStepId fk (is this not covered by the auto proc tables too? I can only see a workflow link from datacollectiongroup)

I'm not sure recording things like beamdumps in here is sensible, because it means for every on going session / sample you will duplicate the same information (i.e. you have 5 beamlines in operation, you have 5 entries in this table with the same information, 5 beamlines may be 25 in due course).

KarlLevik commented 7 years ago

Let's say it is a shortcut to get an overview about what is happening, the idea is not to retrieve information quickly but exhaustively.

Maybe I still don't get it, but this reminds me of something we've talked about previously for Synchweb: The idea of a "wall" or "activity stream" with everything of relevance to the user, a bit like Facebook and other social media platforms. This allows users to see what is happening with their data collections and processing in "real-time". We wanted to have a special table for this, an "events" table, which sounds a lot like your notifications table. The application would then poll this table at short intervals.

We could never quite agree on this table at Diamond, so at the moment we solve this in Synchweb through complex queries and joins. The data collection list on a session is our "activity stream" page.

The idea of duplicating information for performance reasons was just too alien to some in my group, though personally I don't see any problem with it as long as we're clear on what is the authoritative data and what is just duplicated for performance reasons.

However, consider this:

For these reasons I've been thinking more recently that maybe a better solution could be found outside of the relational database realm. I haven't put a lot of thought into this yet, but perhaps something like a messaging system could be a better solution? Maybe WAMP and websockets?

antolinos commented 7 years ago

Hi @KarlLevik

Back to your considerations:

The notifications / events data can potentially grow quite large (e.g. if every new row in ImageQualityIndicators is considered an event ...).

It really depends on the facility and how you fill that table. We don't expect this to be very large. We don't want to store every event but events that we want to notify to the user

High write frequency on a table could lead to contention due to locking, especially if there are a lot of foreign keys with implicit indexes that will be updated with every insert.

As we are using this inside of a more complex transaction that will most likely block many tables. I don't see this as a showstopper. But if we have got problems in the future it is true that should be reconsidered.

The notifications / events data really isn't something we need to store forever, i.e. it's temporary / ephemeral data.

Again, it is like point one in my opinion, we are discussing here the data model, a facility might want to keep the data forever or to remove every few days.

I am not against about the use of a messaging system or WAMP however I think we can start with something simpler for the moment.

So, what do you think?

stufisher commented 7 years ago

I have been thinking about this, and i think we are looking at this totally the wrong way round. The output we are talking about here always comes from a process. Therefore we should be using a table linked to AutoProcProgram. I had not dug in depth into the Phasing tables, but PhasingProgram is a 1:1 copy of AutoProcProgram, why doesnt phasing using this instead? We should add a foreign key references to all tables that relate to a process back to AutoProcProgram, and then create an AutoProcProgramMessage table something like:

AutoProcProgramMessage

autoprocprogrammessageid fk pk ai autoprocprogramid int fk timestamp timestamp severity int (0-2) message varchar(200) description text

antolinos commented 7 years ago

Original idea was that it does not necessarily comes from a process. It might come from the beamline control module, user portal, users, local contact, etc...

stufisher commented 7 years ago

Ahh that is beyond @olofsvensson's original scope.

What sort of things do you want to record, and against what? That last list looks like things that would be reported against a session?

There is datacollectioncomments, where local contacts / users can write extended comments against a datacollection.

I'm not sure what you'd expect from your user portal? I think UAS is unlikely to put any notifications into ispyb.

Im not trying to be obstructive, just trying to understand exactly what you are trying to do. Still to discuss im afraid. I think the answer to @olofsvensson's original question is as per above, as for other notifications i think i need some examples to understand?

delageniere commented 5 years ago

As the notifications are stored now in another log book for the beamlines, this issue can be closed.