ispyb / ispyb-database-modeling

4 stars 3 forks source link

Schema fixing: We need a BLSession.rootPath column #25

Open KarlLevik opened 6 years ago

KarlLevik commented 6 years ago

I've recently been looking into "archiving" sessions: After moving the sessions' directory to a different location (different mount point), all the file paths stored in the ISPyB database need to be updated.

E.g. we would have a session in a directory /dls/i04/data/2018/mx12345-123/ and we move it to /arc/i04/data/2018/mx12345-123/.

I've written a stored procedure that executes these updates in the database, and it seems to do the job correctly for the two sessions we've tested so far. However, whenever we add more path columns to the database schema, I would have to update the stored procedure.

To me it would seem sensible if we had a root path for each BLSession, and all other paths were relative to this. (This should be backwards compatible since old BLSessions don't have a root path to which the other paths should be relative ... Does that make sense?)

At the moment we're duplicating a lot of data by storing the full path for everything inside a BLSession. It's wasteful and also isn't adhering to normalisation principles.

antolinos commented 6 years ago

Hi @KarlLevik

Nothing against but as we are not working in the same way his use will be different. Some comments on this:

I've written a stored procedure that executes these updates in the database, and it seems to do the job correctly for the two sessions we've tested so far. However, whenever we add more path columns to the database schema, I would have to update the stored procedure. I think this is quiet specific to DLS. I don't know what you mean with "archiving" sessions, do you mean data that is moved to tape? In any case, we store in ISPyB the definitive path as files are copied into our "archiving" file system in real time. So no need to run any store procedure later on.

To me it would seem sensible if we had a root path for each BLSession, and all other paths were relative to this. (This should be backwards compatible since old BLSessions don't have a root path to which the other paths should be relative ... Does that make sense?)

Our main problem about this if that we can not warranty today that all attachments file paths are relative to the same root (as far as root is something more specific than "/")

By other hand: will it not impact the performance when accessing to files as you will always need to query session and target table (or a bigger join) to resolve each file path? Or you could argue that session's root will be in memory but then it will complicated things?

KarlLevik commented 6 years ago

Our main problem about this if that we can not warranty today that all attachments file paths are relative to the same root (as far as root is something more specific than "/")

Yes, I can see this. We're talking about having sessions with multiple beamlines here at DLS, which could mean a different root directory for each. Although, we're now moving towards a new directory structure which would avoid that problem - /dls/mx/data/$proposal/$visit .

By other hand: will it not impact the performance when accessing to files as you will always need to query session and target table (or a bigger join) to resolve each file path? Or you could argue that session's root will be in memory but then it will complicated things?

I'm thinking that to retrieve data from the DC table we need to do a table join with the session table anyway, so in that sense it shouldn't hurt performance.

delageniere commented 6 years ago

If you really need it it's OK for us even if we don't use it.