leginon-org / leginon-redmine-archive

1 stars 0 forks source link

Implement Iterative Stable Alignment and Clustering (ISAC) on a 2-D image stack #2839

Open leginonbot opened 6 months ago

leginonbot commented 6 months ago

Author Name: Neil Voss (@vosslab) Original Redmine Issue: 2839, https://emg.nysbc.org/redmine/issues/2839 Original Date: 2014-07-15 Original Assignee: Dmitry Lyumkis


None

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2014-07-15T13:18:31Z


adding neil files

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Dmitry Lyumkis (@LyumkisLab) Original Date: 2014-07-15T13:43:02Z


Neil, my initial code is here:

/ami/data16/appion/13feb21b/align/sxisac_test/old/align.py

It has tested, but not completely thoroughly. You will need to modify lines 47-51 with original particle numbers. You will also need to build in the combined class average mapping, because at the moment, I am aligning to each generation consecutively.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2014-07-15T20:38:21Z


http://longboard.scripps.edu/betamyamiweb/processing/alignlist.php?expId=12679 First alignment uploaded

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2014-07-15T21:55:29Z


Another one http://longboard.scripps.edu/betamyamiweb/processing/alignlist.php?expId=12398

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Dmitry Lyumkis (@LyumkisLab) Original Date: 2014-07-16T16:56:49Z


here's the command that I'm using to test:

rm /ami/data00/appion/12dec05a/align/isactest/* ;

/home/dlyumkis/myami/appion/bin/runSparxISAC.py \
--stack=355 --generations=2 --projectid=354 --num-part=100 \
--remoterundir=/ami/data00/appion/12dec05a/align/isactest \
--rundir=/ami/data00/appion/12dec05a/align/isactest \
--nproc=8 --runname=isactest --localhost=guppy.scripps.edu \
--jobtype=sparxisac

we still need to pack the results, move them over to remote rundir, and qsub the resulting .commands file on the cluster.

Dmitry

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Sargis Dallakyan (@dallakyan) Original Date: 2014-07-16T17:23:03Z


I have test page for runSparxISAC.py Launcher at http://cronus3/~sargis/myamiweb/processing/runISAC.php?expId=8556:

Just Show Command currently displays this:

/ami/data00/dev/sargis/appion runSparxISAC.py --description="test" --stack=129 --num-part=107 --lowpass=10 --highpass=2000 --bin=2 --nproc=8 --commit --nodes=2 --ppn=4 --walltime=240 --cput=2400 --rundir=/ami/data15/appion/zz07jul25b/align/ISAC41 --runname=ISAC41 --projectid=303 --expid=8556 --jobtype=sparxisac
leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2014-07-16T19:47:33Z


you're in charge

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Dmitry Lyumkis (@LyumkisLab) Original Date: 2014-07-16T21:16:54Z


committing my revisions so far for running. To test:

runJob.py --stack=355 --generations=2 --projectid=354 --num-part=1000 --remoterundir=/ami/data00/appion/12dec05a/align/isactest2 --rundir=/ami/data00/appion/12dec05a/align/isactest2 --nodes=8 --ppn=4 --mem=48 --runname=isactest2 --localhost=guppy.scripps.edu --jobtype=sparxisac --ou=25 --expid=10755 --lp=20 --hp=400 --bin=2 --thld_err=5:10

The above command will create a jobfile in the rundir with all the relevant commands to pre-process the stack and run ISAC. It will then launch the jobfile. This needs to be synced with the webserver launching and with the upload, i.e. the webserver needs to past the appropriate parameters, and the uploader needs to read in the pickle file that is generated by this command.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Dmitry Lyumkis (@LyumkisLab) Original Date: 2014-07-16T21:32:13Z


post-Appion workshop todo, 2014.07.16:

  1. make sure that the stack is transferred to remoterundir by the webserver when the job is launched
  2. interconnect (1) webpage, (2) python run wrapper, and (3) python uploader
  3. take into account rotation angles in the database from aligned stack and append those to rotation angles from ISAC (useful for RCT)
  4. upload webpage, make sure that the webpage knows when upload is ready (is this built in???)
  5. test with real RCT data and check whether rotation angles are uploaded correctly!
leginonbot commented 6 months ago

Original Redmine Comment Author Name: Dmitry Lyumkis (@LyumkisLab) Original Date: 2014-07-19T19:27:56Z


Hi Neil, Sargis,

I am hoping that in the next few weeks we can finalize this issue and have a working version of ISAC that we can test. Let me know if you want me to put in any specific code, test things out, or help out in any way.

Dmitry

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2014-07-21T13:55:29Z


Hi Dmitry, Last I checked I was still waiting for the launcher to be finish so I could check my uploader.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Gabriel Lander (@gclander) Original Date: 2014-07-24T14:35:51Z


while we're waiting for the web gui, is there a way to run this on the garibaldi cluster? When I execute: runJob.py --stack=90 --generations=2 --projectid=329 --num-part=7052 --remoterundir=/gpfs/group/em/appion/glander/14jul23a/align/isac1 --rundir=/gpfs/group/em/appion/glander/14jul23a/align/isac1 --nodes=8 --ppn=8 --mem=376 --runname=isac1 --localhost=garibaldi.scripps.edu --jobtype=sparxisac --ou=30 --bin=2 --expid=13753 --lp=20 --hp=300 --thld_err=10:20

I get the error: ... Looking up session, 13753 Traceback (most recent call last): File "/gpfs/home/glander/myami/appion/bin/runJob.py", line 16, in agent.Main(sys.argv[1:]) File "/gpfs/home/glander/myami/appion/appionlib/apAgent.py", line 78, in Main self.updateJobStatus(self.currentJob, hostJobId) File "/gpfs/home/glander/myami/appion/appionlib/apAgent.py", line 134, in updateJobStatus projDB = self.initDB(jobObject, hostJobId)
File "/gpfs/home/glander/myami/appion/appionlib/apAgent.py", line 246, in
initDB clustq['user'] = os.getlogin() OSError: [Errno 2] No such file or directory

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Gabriel Lander (@gclander) Original Date: 2014-07-24T14:40:41Z


ignore that last question, I've never used the "runJob.py" script before, it seems it must be run on the head node. We need to add an option to specify a specific queue on the cluster, but that's a different matter.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Dmitry Lyumkis (@LyumkisLab) Original Date: 2014-07-24T14:51:01Z


Hi Neil, I will test the launcher on one of my stacks later tonight, and then you should be able to just sync it with your upload based on the pickle file. Gabe, I will post the command here. Note, however, that the transfer between clusters is still not working, as I believe that it is performed on the webserver side. Dmitry

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-07-28T15:47:07Z


Gabe, I was working on adding the option to specify a queue prior to the workshop, I'll try to finish that up today.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-08-13T20:21:35Z


Haya all, Is this in a good place for me to write the GUI for the uploader? Looks like it will be a bit of a project because there is currently and upload page in place but it is very maxlikealign specific. I will need to separate out general align upload stuff and the creation of the specific upload command. (I'm assuming all align jobs that need to be uploaded will appear on a single page that is shown when the user clicks on the "# ready to upload" link. Correct me if I am wrong.)

I only have 9 more working days with the AMI lab, so I'll need to get started on this soon if I am going to do it.

Neil, can you point me to the command including parameters and validations required?

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2014-08-14T14:09:33Z


Hi Amber,

I was testing in four folders of Dmitry's (some have moved since workshop):

  1. little run:

    * /ami/data16/appion/13nov14a/align/sxisac1

  2. run with skipped iterations:

    * /ami/data16/appion/13sep27a/align/sxisac1

  3. test run:

    * /ami/archive2/md0/appion/zz09apr14b/stacks/stack1/isac

  4. big run:

    * /ami/data16/appion/13feb21b/align/sxisac1

A typical command was like (from /ami/data16/appion/13sep27a/align/sxisac1)

uploadSparxISAC.py \
  --projectid 224 --runname sxisac1 -d 'testing upload' \
  --timestamp 08nov27e54 --alignstackid 288 \
  --commit

I have not run ISAC from start to scratch, but our plan was to use the same structure as the maxlike, but create a new table ApISACJobData instead of ApMaxLikeJobData to check for active jobs. No progress was made on this front, but I think it should be very similar to maxlike in terms of validations.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Dmitry Lyumkis (@LyumkisLab) Original Date: 2014-08-18T19:41:34Z


Hi Amber,

For the web gui, it should basically do the same thing as the standard aligners. There are several differences. First, one should be able to choose from either the regular stack or an aligned stack. The second difference is that, once the stack is chosen, the data needs to be transferred over to a remote host, so that it can be processed in a similar manner as the reconstructions are. ISAC will always be run on garibaldi, so it hsould be compatible with that cluster. The command that I was using to test:

runJob.py --stack=355 --generations=2 --projectid=354 --num-part=1000 --remoterundir=/ami/data00/appion/12dec05a/align/isactest2 --rundir=/ami/data00/appion/12dec05a/align/isactest2 --nodes=8 --ppn=4 --mem=48 --runname=isactest2 --localhost=guppy.scripps.edu --jobtype=sparxisac --ou=25 --expid=10755 --lp=20 --hp=400 --bin=2 --thld_err=5:10

Those are the options that the web gui should be generating. Please take a look at the python side of the launcher to see what other options need to be there.

Dmitry

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-08-18T20:46:09Z


Thanks D. Is it possible to test on Guppy? I have not had any luck getting it to run today. A few parameters I am not sure what they are:

  1. generations
  2. ou
  3. thld_err

Would be nice to have for each of these:

  1. label
  2. default value
  3. help text
  4. validations
leginonbot commented 6 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-08-18T20:54:25Z


on garibaldi I get: Cound not find sxisac.py in your PATH

Do I need to include another module?

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Dmitry Lyumkis (@LyumkisLab) Original Date: 2014-08-18T21:04:38Z


I have not been getting the same results on guppy and garibaldi. The above command on guppy creates and launches the jobfile appropriately. It looks like Garibaldi is using old runjob.py code, and the same command creates a different job file. Amber, erhaps you can coordinate with JC to update this.

I would strongly suggest to test on garibaldi. This job should never be run on guppy, as it is too computationally intensive. In doing that, we might be able to get through some of the other bugs (as per above) as well.

What I just did is launch /ami/data17/appion/12dec05a/align/isactest3 on guppy, then killed the job, changed up some paths for calling ISAC, and relaunched it on garibaldi. You can check the differences between the job file created by runjob.py (isactest3.appionsub.guppy.job) and the one that I submitted to garibaldi (isactest3.appionsub.garibaldi.job). The latter should work, which I'll find out once it has stopped running, hopefully tomorrow.

all the labels, defaults, help texts, etc. are already in apSparxISAC.py.

I have this line to load sparx/eman2: module load eman/2.04

All we really need to do now is sync the weblauncher to successfully transfer the stack to a remote host and execute an above-like command, and then get the uploader to sync with the results.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-08-20T14:27:32Z


Dmitry, There are about 20 parameters in the python script in setIterationParamList. Are ou and thld_err the only ones that should be added to the launch gui? Or is the launch gui complete as is and should not add those params at all? It looks like you have set defaults for all of them on the python side.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-08-20T23:09:48Z


r18535 Adds a launch GUI that should now work with Dmitry's apSparxISAC.py.

Still to do:

  1. Dmitry, should the ou param (outer radius) be set in the GUI to box/2 -2, or leave it blank so that the param is not passed to the python and you can set the default value there? I left the other advanced params blank so that the resulting command is more manageable.
  2. I've started on copying the stack to the remote cluster path, but have not completed that so it is not functional yet.
  3. I think I also need to add another stack selector for aligned stacks. Please confirm.

Other things I did this week that are not checked in yet b/c it needs more work/testing:

  1. database changes to the PHP side, plus new SQL
  2. added isac jobs to the pipeline menu tally
  3. created a new page to show all the alignments ready for upload, including isac and maxlikealign.

Things left TODO on the upload side:

  1. complete ISAC upload page
  2. test the whole thing from start to end
leginonbot commented 6 months ago

Original Redmine Comment Author Name: Sargis Dallakyan (@dallakyan) Original Date: 2014-08-22T18:44:29Z


Thank you Amber. "isacForm.inc":https://emg.nysbc.org/redmine/projects/appion/repository/revisions/18535/entry/trunk/myamiweb/processing/inc/forms/isacForm.inc nicely shows how to use generateAdditionalFormRight and generateAdditionalFormLeft using "new base form for appionloop web GUI forms":http://emg.nysbc.org/redmine/issues/2634.

Regarding Still to do: 3. Yes indeed, it needs another stack selector. Neil showed me "myamiweb/processing/runMakeStack2.php":http://emg.nysbc.org/redmine/projects/appion/repository/entry/trunk/myamiweb/processing/runMakeStack2.php page that has an example on how to do it. Would be happy to continue working with you on this.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Clint Potter (Clint Potter) Original Date: 2014-08-27T15:19:56Z


Dmitry says thus needs web launcher to transfer the stack. 2 ways to do this. #1 Have cluster read input stack from file server #2 transfer in stack during refinements. Needs a web GUI to setup a command and transfer the stack. ISAC runs at least guppy but doesn't run on garibaldi (Dmitry thinks AMber is looking into this issue). Sargis will update myami trunk on Garabaldi.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-08-29T02:51:07Z


Clint, I have already started on the code to transfer the stack. I have NOT had a chance to look into why ISAC is not working on Garibaldi. It would be great if Sargis could give that a try while I'm out next week. Then I could try a complete run with upload perhaps Wednesday of the following week.

Sargis, Thanks for the info regarding the stack selector. I can add it when I return, or if you have a chance before then to work on it, feel free.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-09-17T20:00:28Z


r18573 adds many GUI changes to support ISAC. The Aligned Stack selector is now available. Also includes menu integration and page to show isac jobs ready for upload.

TODO:

  1. Complete the upload GUI. Sargis you said you might like to work on this. I realized I had already started on it, so I'll check in what I've got at the end of today and you can continue with it if you like.
  2. I did not complete the copy of the stack to a remote cluster. A function is in place for it but needs some guts. I'm not sure on the details of this. Looks like the python file is looking for the stack based on it's location in the DB...Do we just need to copy the stack in the case where the cluster does not have access to the stack?
  3. Add localhost to the command - looks like this is needed for results rsync
  4. Every time the launch page is reloaded, the base file is added to the remoterundir resulting in path/align/align. I think this is also happening to refinements...path/recon/recon. Should be an easy enough bug fix, but keep an eye on your remoterundir before submitting.
  5. Probably need to add a DB update script to add isac run params field to ApAlignRunData in existing DBs.

There are certainly bugs in r18573. Some are minor and noted in the code with TODO's which I hope to address. The files edited are used by many other things, so keep an eye out for new bugs in old features.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-09-17T22:26:23Z


r18576 adds a start on the Upload GUI.

Sargis, this page still needs quite a bit of work. Since ISAC is not really a refinement run and not really an Alignment run but straddles both worlds, you'll need to take a close look at when info is added to the database and weather what you need for the upload command is available. The run step may need to add some info on the python side.

Also, we need to identify a directory with a completed run to see what the output files look like in terms of file names so the upload gui can list and display those.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Clint Potter (Clint Potter) Original Date: 2014-09-24T15:52:13Z


Discussed during Appion conference call. Dmitry needs to create a picl file to pass parameters from ISAC. Amber will note details in issue. Amber will change Neil's code to make this work for now.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-09-24T17:54:42Z


r18584 implements what was discussed in the dev meeting today to work around the fact that ApSparxISACJobData is not populated with job data during the run step. This assumes that

  1. The head node of the cluster has access to the appion database
  2. The results have been copied to the local run directory during the run step using rsync and the local helper host and these results include a pickle file with all the run parameters.

Still to do:

  1. need to add localhost to the run launch page to use for the rsync. Dmitry, can you confirm that this is needed...this is just what I implied from reading your code.
  2. I need a complete ISAC run that includes the results and the pickle file to test the uploader. So far I get stuck in the upload because the test run I have does not have the pickle file.
  3. Once the upload is run, I need to confirm that the status of the run is set correctly, namely that it is marked as uploaded and does not appear under the ready to upload menu item.

In the future, there should be an effort to turn this type of remote job that does not require a prep step into something a bit more defined. One issue is that there are 3 tables with the same fields:

  1. ApSparxISACJobData
  2. ApMaxLikeJobData
  3. ApTopolRepJobData

Perhaps these could be merged or the fields added ApAppionJobData. It is used to track if a job is finished, hidden, and what the timestamp is that is appended to filenames. There should be a set procedure for how these remote jobs are setup and run so that common tasks can be consolidated and we make sure things that need to be done are not forgotten. For example, the ISAC jobs can not be hidden with the current implementation, which is not ideal.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Dmitry Lyumkis (@LyumkisLab) Original Date: 2014-09-24T20:15:24Z


Amber, as far as I can tell, your assumptions are fine.

from what I recall, if the cluster does not have access to the filesystem, then it requires --localhost and --remoterundir

in the web-launcher, I couldn't find a box for "threshold error" (that's the pixel error). That parameter, as well as the inner and outer radii, should not be advanced parameters.

I also can't launch an ISAC job anymore, neither using my old command, nor from the webserver, so not sure what is going on. switching to longboard just gives me a blank page in the launch page.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-09-24T21:26:01Z


r18586 adds localhost to the run command.

I'm unable to reproduce launch errors here with cronus3, longboard or my sandbox. Can troubleshoot Monday if needed.

Moving inner and outer radius to be non-Advanced and adding threshold error now. I see I missed that param entirely!

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-09-24T22:05:41Z


I ran a very small isac job on guppy to test the upload with. The python upload script is not finding the timestamp that is prepended to the filenames correctly so could not find the pickle file.

I'll look into that Monday, unless you want to take a look before then Neil. I know you enjoy regular expressions!!!

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-09-29T15:10:18Z


OK, the timestamp was repeatedly incorrect, then I added a few debug lines, it started working and when I removed them it is still working. Not sure what's up with that. glob seemed to have trouble finding the files.

Now I have run into an issue that the small test run I did, does not have any class_averagesgeneration*.hdf files. David said it looks like the run did not complete properly. So I need a good run to test the upload with. I've been working with this run: /ami/data00/appion/zz07jul25b/align/ISAC61

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2014-09-29T15:26:07Z


Hi Amber, I am following your posts, but have not been able to contribute. Yes, I like regular expressions, what is the name of the pickle file it is creating. I could adjust the regex to make sure to find it.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Amber Herold (Amber Herold) Original Date: 2014-09-29T18:01:50Z


Neil, I think the expression is fine, there was something else going on, just not sure what.

I'm assigning this back to Dmitry since he is the lead on this feature. All the GUI components should be in place.

TODO:

  1. Run this from start to end.
  2. Confirm that the run and upload scripts work together.
  3. Confirm that uploaded runs are shown as complete.

I've been stalled because I don't have a properly completed run with all required output files to upload. However, I have confirmed that the upload gui launches the upload script correctly and reads the pickle file.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Clint Potter (Clint Potter) Original Date: 2014-11-19T16:09:08Z


Discussed during Appion conference call. Still in progress. Waiting for EMAN2 installation at Salk. Dmitry committed to doing this.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Dmitry Lyumkis (@LyumkisLab) Original Date: 2015-03-11T20:15:43Z


This issue is on hold until we can get EMAN2 installed at Salk. We are running into python dependency issues and MPI issues.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Clint Potter (Clint Potter) Original Date: 2015-04-20T19:08:47Z


Still on hold waiting for eman2. Scott has some hints for eman2.