Open chiahaoliu opened 6 years ago
maybe related to #108
attn: @dooryhee @sghose
One potential solution is to not archive the tiff_base
data. This will reduce the memory footprint and increase the transfer speed. We should be able to reproduce the data via the re-running of the analysis pipeline.
I see two solutions.
1) I like CJ's suggestion. Actually, since we have a save_last_tiff pipeline
now it would be relatively easy to rerun the user experiment and
save out all the tiffs when asked. It would be much much harder to keep
track of the tiffs that they actually did save during the experiment and
just save out those on a replay request. we could build a little front-end
also that allows some filtering before running the, let's call it
replay_last_tiff
so not all tiffs are saved. For pipelines to work
again post-facto we may have to save all the darks as well. This could be
a large file. In this scenario, user's need to be very clear that their
local tiffs will not be archived so they need to be saving them during the
experiment if they want to walk away with them.
2) we have two places where we site .../xpd_users
. UC looks like:
start_beamtime
creates a clean .../xpd_users
in location 1, so
.../1/xpd_users
_end_beamtime
which archives data as normalstart_beamtime
start_beamtime
either detects that the previous experiment was done
in .../1/xpd_users
, or it detects that .../1/xpd_users
is not empty
and looks to see if .../2/xpd_users
is empty, then it builds a clean
env in .../2/xpd_users
.../2/xpd_users
even as
end_beamtime
is still archiving user 1 data.end_beamtime
obliterates .../1/xpd_users
as
usual leaving it ready for the next start.There may also be a hybrid solution. I like (2) because it is not UI-breaking......from the user/BLS perspective the workflow is the same. I like (1) because it takes us in the direction we want to go in, that we are actually starting to use the databases as they should be used.
My suggestion (discussion please?): 1) we try and implement (2) asap and roll it out (I am sure there are unforeseen problems) 2) we put (1) on a future release milestone, but initially have it as an option.....which users may like.....of a tiff-free archive, or a tiff-selected archive that will fit on their external hard-drive.
On Thu, Aug 23, 2018 at 1:50 PM Christopher J. Wright < notifications@github.com> wrote:
One potential solution is to not archive the tiff_base data. This will reduce the memory footprint and increase the transfer speed. We should be able to reproduce the data via the re-running of the analysis pipeline.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/xpdAcq/mission-control/issues/135#issuecomment-415509418, or mute the thread https://github.com/notifications/unsubscribe-auth/AEDrUdeJ7W5xhGWkqhaZdjTv6WOV6KQqks5uTutTgaJpZM4WESqU .
xpdAcq and xpdAn look at directories which are configured by a yaml file at each beamline, maybe it's better to keep xpdUser directory at the same location so that we don't need to poke the configuration too often. So I would like to propose a variant of UC (2) as following:
Does it make sense?
@xpdAcq/bls thoughts?
Correct me if I understand wrong. So the archives will happen at xpdUser directory? after archive, program will rename and move xpdUser to xpdCofig, then data transfer archived data to remote location, while transferring we can start_beamtime.
what if procedures are:
You think this way will be faster? or it won't have big difference?
Is this the final plan (sounds ok to me)? Is it part of the post-school new release?
Not to my knowledge but I'll check.
@adelezh yes, your understanding is correct and thanks for the suggestion, I really like it. Even though the archiving process is reasonably fast, we should avoid downtime as much as we can. Based on the discussion here, _end_beamtime refactoring will be implemented with the logic described by @adelezh.
Note on the current implementation of end_beamtime:
The new functionality is great, essentially instant. Two big flaws:
1.) Please, please, PLEASE take out the question at the end to delete the archive it just made. Users (and myself) have lost notes and analysis that was stored in that folder because someone hits 'y' at the end of the archive procedure, not realizing this deletes the archive. Unless this is also kept somewhere else that isn't deleted? It would be great to find out it was, but I don't think that it is.
2.) It appears that bsui needs to be restarted after running end_beamtime, or it does not correctly 'forget' the previous sample names associated with certain sample numbers. This is to say if you had samples in bt.list such as:
and then run _end_beamtime, start a new beamtime, and import a new sample list (without exiting bsui), you can type bt.list and see:
and yet, running xrun(3,0) would produce a file associted with sample_A.
Restarting bsui seems to remedy the issue, but I figured this counts as a bug.
I think that makes sense.
On Fri, Aug 31, 2018 at 1:34 PM Timothy Liu notifications@github.com wrote:
xpdAcq and xpdAn look at directories which are configured by a yaml file at each beamline, maybe it's better to keep xpdUser directory at the same location so that we don't need to poke the configuration too often. So I would like to propose a variant of UC (2) as following:
- User runs _end_beamtime
- Program archives contents of the current xpdUser directory
- Program creates a tarball inside current xpdUser directory:
- Program renames current xpdUser directory as
- Program moves current xpdUser directory under as xpdConfig/
- Program creates a fresh xpdUser directory
- Program transfers
to remote location - BLs can start new beamtime anytime while transferring
- Program remove backup
directory inside xpdConfig when the archive is transferred to remote location. Does it make sense?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/xpdAcq/mission-control/issues/135#issuecomment-417737627, or mute the thread https://github.com/notifications/unsubscribe-auth/AEDrUUOLc0QvIuvva55kSsKeinfOMg4tks5uWXO7gaJpZM4WESqU .
-- Prof. Simon Billinge Applied Physics & Applied Mathematics Columbia University 500 West 120th Street Room 200 Mudd, MC 4701 New York, NY 10027 Tel: (212)-854-2918 (o) 851-7428 (lab)
Condensed Matter Physics and Materials Science Dept. Brookhaven National Laboratory P.O. Box 5000 Upton, NY 11973-5000 (631)-344-5661
email: sb2896 at columbia dot edu home: http:// http://nirt.pa.msu.edu/bgsite.apam.columbia.edu/
@DanOlds is the terminal giving any errors during the end beamtime process?
@DanOlds can you please open a separate issue for 2?
@DanOlds thanks for reporting this issue. we are working on the first issue.
Regarding the second issue, I am wondering if you checked the metadata actually goes into db?
Does xrun(3, 0)
give a header with sample_name = sample_C
?
One possible scenario I can think of is the xrun
is still linked with bt
from previous beamtime, therefore, old sample indices. We would need to do xrun.beamtime = bt
again to update the reference (bt
is not singleton)
@chiahaoliu that could very well be the case. I'm happy to test that next time there is a break in the schedule at the beamline, because I can't remember the exact sequence of events that led to this behavior on Monday. If that is the case, it might be advisable for bsui to 'forget' xrun at the end of _end_beamtime, to force the next user to re-associate xrun with the bt.
Yes, we need to do xrun.beamtime=bt after we start a new beamtime, otherwise xrun is still link to the old bt. It happened before at D hutch.
Hui
Sent from my iPhone
On Mar 20, 2019, at 5:02 PM, DanOlds notifications@github.com wrote:
@chiahaoliu that could very well be the case. I'm happy to test that next time there is a break in the schedule at the beamline, because I can't remember the exact sequence of events that led to this behavior on Monday. If that is the case, it might be advisable for bsui to 'forget' xrun at the end of _end_beamtime, to force the next user to re-associate xrun with the bt.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@DanOlds Just examined the code again, I found answering y
at the end of the archiving process should lead to no action in the code ref.
Could you elaborate a bit more on this issue? Does the remote archive go to an unexpected place which is different than the path shown by _end_beamtime
function?
@chiahaoliu the current behavior, to my understanding, is that the user_data directory is moved/renamed to one called 'user_data_PIname_SAF_datatime'. A new, empty 'user_data' directory is then created. The 'user_data_PIname_SAF_datetime' is then effectively an archive of the users entire working directory. There is no reason to delete it automatically via the y/n option at the end of _end_beamtime. It seems reasonable to me that we could simply manually delete that archive at a later time (thus assuring time for the contained notes/analysis to be retrieved or transfered).
If you look in the /nsls2/xf28id1/xpdacq_data folder currently, you'll see a number of these directories, as well as the active 'user_data' directory.
Reported by @jmbai2000 at beamline and converge discussion in email threads to an issue
_end_beamtime
process takes a long time when user has a beamtime with a large number of tiffs. The origin of the problem is we need to copy&paste local archive to remote drive (GPFS-mounted) and the current bottleneck is the network speed.Note: Tested in early May, it took ~10 mins for a beamtime with ~100 tiffs to finish. If everything scales linearly (best case), it's possible that a beamtime with thousands of tiffs will take hours.
Expected Behavior
_end_beamtime
finishes within a reasonable time regardless of the amount of data collected.Current Behavior
_end_beamtime
may take hours when user collects sizeable data.Context
xpdAcq expects an empty xpdUser directory to start a new beamtime, ie, archive file should be completely moved to remote.