lhcb / first-analysis-steps

LHCb data analysis lessons
https://lhcb.github.io/first-analysis-steps/
12 stars 55 forks source link

Why edit .gangarc for MassStorageFile? #235

Open rob-c opened 7 years ago

rob-c commented 7 years ago

I'm reading through the lessons here for something unrelated I spotted there are some instructions on editing the MassStorageFile settings within .gangarc. (https://lhcb.github.io/first-analysis-steps/eos-storage.html)

Is the default constructed .gangarc not working or broken in some way for users or should there be something turned on/off here?

I'm just wondering if Ganga needs to change some out of the box behaviour? because it's generally discouraged to go altering default settings if they work correctly. If there is a problem out of the box I think it should be trivial for Ganga to fix.

alexpearce commented 7 years ago

Thanks for checking! It's amazing how quickly things become redundant, or just plain wrong.

I suspect this configuration block is the former, in that Ganga already does these things by default. Plus, we already have a deprecation notice saying that MassStorageFile is really recommended. Maybe we can just remove the references to MassStorageFile altogether?

rob-c commented 7 years ago

@alexpearce I would like to see this remain as it's a useful resource, especially as it allows for running locally or on the batch system to access 'gridified' data through EOS.

FYI: There is also SharedFile which may be useful to Ganga users who may be wanting to run Ganga @ their local institute who want to move files around to some posix accessible fixed storage solution. e.g. for private MC production and such.

I suppose the question is how expert is using a MassStorageFile regarded to be? I would imagine EOS an essential tool to get to know and use for day to day things. (I'd encourage the use of this over ~/public any day of the week).

alexpearce commented 7 years ago

If I remember right, the history is that we thought storing every on EOS was the future, and that's what MassStorageFile does, so let's use that. But then we saw on a couple of email threads that this is very inefficiency when running on the Grid, because the file is proxied via the machine Ganga is running on. This is why I'd consider the advice to use MassStorageFile at all 'deprecated'.

But, I'm not familiar with the other reasons for using it.

rob-c commented 7 years ago

@alexpearce The logic was that EOS is not remotely writeable and hence a jobs data has to come back to CERN before the data can be migrated hence the bottlenecks. (I don't know what's technically required to write to EOS from off site or if it's possible)

It's up to you but you may want to drop MassStorageFile from the EOS introduction (certainly the stuff about the .gangarc looks like it could just go, I'll make a small PR if that would be helpful).

It might be worth having an advanced data management bit somewhere to describe how best to manage data within Ganga... but a tutorial on this has been needed for a while now.