leginon-org / leginon-redmine-archive

1 stars 0 forks source link

Add RELION 2D alignment and classification #3971

Open leginonbot opened 6 months ago

leginonbot commented 6 months ago

Author Name: Neil Voss (@vosslab) Original Redmine Issue: 3971, https://emg.nysbc.org/redmine/issues/3971 Original Date: 2016-02-19 Original Assignee: Carl Negro


None

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2016-03-07T21:19:51Z


Reference for program: http://www2.mrc-lmb.cam.ac.uk/relion/index.php/Calculate_2D_class_averages

Installed RELION inside docker Needed to use FLTK v1.3.0 (CentOS only had v1.1.10) Native FFTW 3.2.1 (RELION came with 3.2.2)

Annoyed that I had to open the GUI to figure out what program to run. Appears to use relion_refine as its main program.

+++ RELION: command line arguments (with defaults for optional ones between parantheses) +++
====== General options ===== 
                                --i : Input images (in a star-file or a stack)
                                --o : Output rootname
                           --angpix : Pixel size (in Angstroms)
                        --iter (50) : Maximum number of iterations to perform
                   --tau2_fudge (1) : Regularisation parameter (values higher than 1 give more weight to the data)
                            --K (1) : Number of references to be refined
           --particle_diameter (-1) : Diameter of the circular mask that will be applied to the experimental images (in Angstroms)
                --zero_mask (false) : Mask surrounding background in particles to zero (by default the solvent area is filled with random noise)
          --flatten_solvent (false) : Perform masking on the references as well?
              --solvent_mask (None) : User-provided mask for the references (default is to use spherical mask with particle_diameter)
             --solvent_mask2 (None) : User-provided secondary mask (with its own average density)
                       --tau (None) : STAR file with input tau2-spectrum (to be kept constant)
      --split_random_halves (false) : Refine two random halves of the data completely separately
       --low_resol_join_halves (-1) : Resolution (in Angstrom) up to which the two random half-reconstructions will not be independent to prevent diverging orientations
====== Initialisation ===== 
                       --ref (None) : Image, stack or star-file with the reference(s). (Compulsory for 3D refinement!)
                       --offset (3) : Initial estimated stddev for the origin offsets
             --firstiter_cc (false) : Perform CC-calculation in the first iteration (use this if references are not on the absolute intensity scale)
                    --ini_high (-1) : Resolution (in Angstroms) to which to limit refinement in the first iteration 
====== Orientations ===== 
                 --oversampling (1) : Adaptive oversampling order to speed-up calculations (0=no oversampling, 1=2x, 2=4x, etc)
                --healpix_order (2) : Healpix order for the angular sampling (before oversampling) on the (3D) sphere: hp2=15deg, hp3=7.5deg, etc
                    --psi_step (-1) : Sampling rate (before oversampling) for the in-plane angle (default=10deg for 2D, hp sampling for 3D)
                 --limit_tilt (-91) : Limited tilt angle: positive for keeping side views, negative for keeping top views
                         --sym (c1) : Symmetry group
                 --offset_range (6) : Search range for origin offsets (in pixels)
                  --offset_step (2) : Sampling rate (before oversampling) for origin offsets (in pixels)
                    --perturb (0.5) : Perturbation factor for the angular sampling (0=no perturb; 0.5=perturb)
              --auto_refine (false) : Perform 3D auto-refine procedure?
     --auto_local_healpix_order (4) : Minimum healpix order (before oversampling) from which autosampling procedure will use local searches
                   --sigma_ang (-1) : Stddev on all three Euler angles for local angular searches (of +/- 3 stddev)
                   --sigma_rot (-1) : Stddev on the first Euler angle for local angular searches (of +/- 3 stddev)
                  --sigma_tilt (-1) : Stddev on the second Euler angle for local angular searches (of +/- 3 stddev)
                   --sigma_psi (-1) : Stddev on the in-plane angle for local angular searches (of +/- 3 stddev)
               --skip_align (false) : Skip orientational assignment (only classify)?
              --skip_rotate (false) : Skip rotational assignment (only translate and classify)?
====== Corrections ===== 
                      --ctf (false) : Perform CTF correction?
    --ctf_intact_first_peak (false) : Ignore CTFs until their first peak?
        --ctf_corrected_ref (false) : Have the input references been CTF-amplitude corrected?
        --ctf_phase_flipped (false) : Have the data been CTF phase-flipped?
         --only_flip_phases (false) : Only perform CTF phase-flipping? (default is full amplitude-correction)
                     --norm (false) : Perform normalisation-error correction?
                    --scale (false) : Perform intensity-scale corrections on image groups?
====== Computation ===== 
                            --j (1) : Number of threads to run in parallel (only useful on multi-core machines)
            --memory_per_thread (2) : Available RAM (in Gb) for each thread
  --dont_combine_weights_via_disc (false) : Send the large arrays of summed weights through the MPI network, instead of writing large files to disc
          --onthefly_shifts (false) : Calculate shifted images on-the-fly, do not store precalculated ones in memory
      --no_parallel_disc_io (false) : Do NOT let parallel (MPI) processes access the disc simultaneously (use this option with NFS)
           --preread_images (false) : Use this to let the master process read all particles into memory. Be careful you have enough RAM for large data sets!
====== Expert options ===== 
                          --pad (2) : Oversampling factor for the Fourier transforms of the references
                       --NN (false) : Perform nearest-neighbour instead of linear Fourier-space interpolation?
                    --r_min_nn (10) : Minimum number of Fourier shells to perform linear Fourier-space interpolation
                         --verb (1) : Verbosity (1=normal, 0=silent)
                 --random_seed (-1) : Number for the random seed generator
                 --coarse_size (-1) : Maximum image size for the first pass of the adaptive sampling approach
        --adaptive_fraction (0.999) : Fraction of the weights to be considered in the first pass of adaptive oversampling 
                     --maskedge (5) : Width of the soft edge of the spherical mask (in pixels)
          --fix_sigma_noise (false) : Fix the experimental noise spectra?
         --fix_sigma_offset (false) : Fix the stddev in the origin offsets?
                   --incr_size (10) : Number of Fourier shells beyond the current resolution to be included in refinement
    --print_metadata_labels (false) : Print a table with definitions of all metadata labels, and exit
       --print_symmetry_ops (false) : Print all symmetry transformation matrices, and exit
          --strict_highres_exp (-1) : Resolution limit (in Angstrom) to restrict probability calculations in the expectation step
          --dont_check_norm (false) : Skip the check whether the images are normalised correctly
                --always_cc (false) : Perform CC-calculation in all iterations (useful for faster denovo model generation?)
leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2016-05-16T20:57:42Z


Alright I thought I would move the conversation here.

Hi Carl. Why do we need to have SO MANY cluster parameters. Xmipp maximum likelihood has 1 cluster parameter; RELION has 9+ cluster parameters (attached). They are basically the same program. Do we really need this fine control? How can we streamline this?

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Carl Negro (@carl9384) Original Date: 2016-05-16T22:30:21Z


Hi Neil,

We found that with certain jobs relionmaxlike would break or hang forever unless we micromanaged the number of processors and memory. Not sure how we could get around this. We can probably set the number of threads to one for all jobs. The number of MPI nodes is the same as the number of nodes in the cluster parameters, but I couldn't figure out how to pass this value over properly.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2016-05-16T23:04:15Z


Maybe we could use the boxsize and the number classes (or other parameters) to make a calculation of the needed memory. I will check the documentation.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2016-05-20T17:15:22Z


Uploader is working, but there is no way to create command from web at the moment. I want to dig into the databases to look at how the information is stored.

Easiest way to test is to go the alignment directory and give it the project number:

cd /emg/data/appion/06jul12a/align/maxlike1
uploadRelion2DMaxlikeAlign.py --commit --projectid=1
leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2016-05-20T17:23:18Z


Hi Anchi, I was looking at the database tables and I am very confused.

Amber created this file 'checkAlignJobs.php' and it looks like it was for SPARX ISAC, but I dunno.

How do upload the data from Xmipp CL2D? Can we upload it? Or does it automatically upload after the job is done? Is Xmipp Maximum likelihood the only alignment where alignment and uploading is separated? The alignment section seems pretty broken.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Anchi Cheng (@anchi2c) Original Date: 2016-05-20T19:34:36Z


I asked the users. Xmipp CL2D is uploaded automatically without a separate upload step. They said only Xmipp Max Likelihood runs need separate upload but neigther of them use ISAC.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2016-05-20T19:39:17Z


Thanks for checking. So, I guess I have a philosophy question. It is easier to just make them upload after finishing, but in the past if the alignment runs for more than 3 days (or whatever is set in the mysql config) then python loses the database connection and cannot upload.

Should we assume RELION 2d alignment will take more than 3 days and do a separate upload, or should I just plug them together into one file?

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2016-05-20T19:43:35Z


Or do you see any reason that at the end of the run the python program could launch the upload script using the subprocess.Popen command. I could change Xmipp maxlike to do this too.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Bridget Carragher (@bcarr-czi) Original Date: 2016-05-21T15:41:05Z


Can we have it so that it automatically uploads if it can and if not offers the user the option of the upload?

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2016-05-24T19:19:50Z


I am calling this done. Need a tester.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Carl Negro (@carl9384) Original Date: 2016-05-26T19:09:37Z


I get the following error when running uploadRelion2DMaxlikealign.py:

============================= Oversampling= 1 NrHiddenVariableSamplingPoints= 66816 OrientationalSampling= 2.5 NrOrientations= 576 TranslationalSampling= 1 NrTranslations= 116

Estimated memory for expectation step > 0.101925 Gb, available memory = 2 Gb. Estimated memory for maximization step > 0.000118688 Gb, available memory = 2 Gb. Expectation iteration 30 of 30 0/ 0 sec ............................................................~~(,,"> Maximization ... 0/ 0 sec ............................................................~~(,,"> ... Sorting files into clean folders ... Sorted 155 iteration files ... Sorted 154 reference files Reading star format file: ref16may26m53_final_data.star Looking for Data Block named data_images... Found Data Block: data_images 001 -- -36.8 -- -36.803607 002 -- 23.2 -- 23.196393 003 -- -119.3 -- -119.303607 004 -- -124.3 -- -124.303607 005 -- 30.7 -- 30.696393 006 -- -111.8 -- -111.803607 007 -- -74.3 -- -74.303607 008 -- -126.8 -- -126.803607 ... read rotation and shift parameters for 8 references Reading star format file: part16may26m53_final_data.star Looking for Data Block named data_images... Found Data Block: data_images 001 -- -112.7 -- -143.352276 002 -- -85.9 -- 38.409389 003 -- 92.6 -- 61.894234 004 -- 61.9 -- -178.807379 005 -- 119.1 -- 88.409389 006 -- -125.9 -- -156.590611 007 -- -24.9 -- -48.105766 008 -- -74.9 -- -105.605766 009 -- -43.4 -- -74.090611 ... read rotation and shift parameters for 3969 particles ... rotating and shifting particles at Thu May 26 12:59:47 2016 ........................................ ... writing aligned particles to file alignstack3969.hed ... 3969 particles in alignstack3969.img (11.9 MB) ... found 3969 particles ... size match 11.9 MB vs. 11.9 MB ... alignstack3969.hed (3969 kB) ... wrote 3969 particles to file alignstack.hed ... size match 3.9 MB vs. 3.9 MB ... finished stack merge of alignstack.hed in 181.69 msec ... rotated and shifted 3969 particles in 5.18 sec /bin/sh: iminfo: command not found Traceback (most recent call last): File "/home/cnegro/myami-trunk/appion/bin/uploadRelion2DMaxlikeAlign.py", line 537, in maxLike.start() File "/home/cnegro/myami-trunk/appion/bin/uploadRelion2DMaxlikeAlign.py", line 521, in start apStack.averageStack(alignimagicfile, msg=False) File "/home/cnegro/myami-trunk/appion/appionlib/apStack.py", line 332, in averageStack avgStack.start(stackfile, partlist) File "/home/cnegro/myami-trunk/appion/appionlib/apImagicFile.py", line 1016, in start self.processStack(stackarray) File "/home/cnegro/myami-trunk/appion/appionlib/apStack.py", line 355, in processStack self.average += stackarray.sum(0) ValueError: invalid return array shape

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2016-05-26T20:05:02Z


First, I have no idea why it would crash there.

I did find it interesting "/bin/sh: iminfo: command not found" Why is iminfo being called? apFile.getBoxSize() uses EMAN1 to get boxsize.

So it could be dying on that command.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Bridget Carragher (@bcarr-czi) Original Date: 2016-05-31T00:57:53Z


I am happy to test and have done so. It is cool but there are a few bugs for sure. See:

4229

in which I can’t upload so can’t check if any of the metadata tracking is working. There are also some issues with doc pop up meanings and defaults - the worst one is that the diameter default if wrong and means that only the very center of the images is focused on. I think the diameter default should either be something that was entered earlier or based on the box size.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Carl Negro (@carl9384) Original Date: 2016-06-09T14:28:21Z


I added a couple of parameters, invert and normalization error checking, and cleaned up the web interface and pop up help files. The uploader is working as intended.

leginonbot commented 6 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2016-06-09T14:57:48Z


Why do we need invert? I thought we force the white particles idea.