Open zhanghj59 opened 2 months ago
Hi, sorry to hear you are having trouble! To help figure out what's going on, can you tell us:
1) How you installed DRGN-AI, and if possible, the version of the installed package.
2) The contents of your drgnai_test1/configs.yaml
file.
Thanks!
I installed DRGN-AI (version 1.0.1)using the following script:
The configs.ymal file
[zhanghj@spgpu drgnai_test1]$ more configs.yaml
particles: /share/zhanglab/zhanghj_data/projects/proteasome-test/Extract/job009/particles.star
ctf: /share/zhanglab/zhanghj_data/projects/proteasome-test/Extract/job009/ctf.pkl
pose: null
quick_config:
capture_setup: spa
reconstruction_type: het
pose_estimation: abinit
conf_estimation: autodecoder
I have tried to put the particles.star and the ctf.pkl file in the Extract/job009/movies or the work_directory of relion, but it didn't work.
I use the following script to activate drgnai:
[zhanghj@spgpu drgnai_test1]$ source /tools/miniconda3/etc/profile.d/conda.sh
[zhanghj@spgpu drgnai_test1]$ conda activate drgnai
(drgnai) [zhanghj@spgpu drgnai_test1]$ drgnai test
Installation was successful!
Can you try adding the line relion31: True
to your configs.yaml and running again?
We also recommend using v0.3.1 instead of v1.0.1, especially if doing ab-initio reconstruction, as we found a bug in the pose search algorithm (#8)! This can be retrieved from the top of the repository tree using git fetch; git pull
from within your checked-out repo.
Hi! @michal-g
I have tried adding the line relion31: True to my configs.yaml and running again using v0.3.1. But it still failed again.
configs.yaml file:
particles: /share/zhanglab/zhanghj_data/projects/proteasome-test/Extract/job010/particles.star
ctf: /share/zhanglab/zhanghj_data/projects/proteasome-test/Extract/job010/ctf.pkl
pose: null
relion31: ture
quick_config:
capture_setup: spa
reconstruction_type: het
pose_estimation: abinit
conf_estimation: autodecoder
(WARNING) (reconstruct.py) (11-Sep-24 23:01:20) Output directory `out/` already exists here!.Renaming the old one to `old-out_005_abinit-het4`.
(INFO) (reconstruct.py) (11-Sep-24 23:01:25) Number of available gpus: 4
(INFO) (reconstruct.py) (11-Sep-24 23:01:25) Use cuda True
(INFO) (reconstruct.py) (11-Sep-24 23:01:25) Will write tensorboard summaries in drgnai_test1/out/summaries
(INFO) (reconstruct.py) (11-Sep-24 23:01:25) Creating dataset
(INFO) (dataset.py) (11-Sep-24 23:04:08) Loaded 144702 128x128 images
(INFO) (dataset.py) (11-Sep-24 23:04:08) Windowing images with radius 0.85
(INFO) (dataset.py) (11-Sep-24 23:04:09) Computing FFT
(INFO) (dataset.py) (11-Sep-24 23:04:09) Spawning 16 processes
(INFO) (dataset.py) (11-Sep-24 23:05:07) Symmetrizing image data
(INFO) (dataset.py) (11-Sep-24 23:05:21) Normalized HT by 0 +/- 102.6186752319336
(INFO) (dataset.py) (11-Sep-24 23:05:35) Normalized real space images by 0.011543781496584415 +/- 0.8028228878974915
(INFO) (reconstruct.py) (11-Sep-24 23:05:38) Loading ctf params from /share/zhanglab/zhanghj_data/projects/proteasome-test/Extract/job010/ctf.pkl
(INFO) (ctf.py) (11-Sep-24 23:05:38) Image size (pix) : 128
(INFO) (ctf.py) (11-Sep-24 23:05:38) A/pix : 1.978124976158142
(INFO) (ctf.py) (11-Sep-24 23:05:38) DefocusU (A) : 16942.06640625
(INFO) (ctf.py) (11-Sep-24 23:05:38) DefocusV (A) : 16821.533203125
(INFO) (ctf.py) (11-Sep-24 23:05:38) Dfang (deg) : 32.922245025634766
(INFO) (ctf.py) (11-Sep-24 23:05:38) voltage (kV) : 300.0
(INFO) (ctf.py) (11-Sep-24 23:05:38) cs (mm) : 2.700000047683716
(INFO) (ctf.py) (11-Sep-24 23:05:38) w : 0.10000000149011612
(INFO) (ctf.py) (11-Sep-24 23:05:38) Phase shift (deg) : 0.0
(INFO) (reconstruct.py) (11-Sep-24 23:05:39) Building lattice
(INFO) (reconstruct.py) (11-Sep-24 23:05:39) Heterogeneous reconstruction with z_dim = 4
(INFO) (reconstruct.py) (11-Sep-24 23:05:39) Initializing model...
(INFO) (reconstruct.py) (11-Sep-24 23:05:39) DrgnAI(
(pose_table): PoseTable()
(conf_table): ConfTable()
(hypervolume): HyperVolume(
(mlp): ResidualLinearMLP(
(main): Sequential(
(0): Linear(in_features=388, out_features=256, bias=True)
(1): ReLU()
(2): ResidualLinear(
(linear): Linear(in_features=256, out_features=256, bias=True)
)
(3): ReLU()
(4): ResidualLinear(
(linear): Linear(in_features=256, out_features=256, bias=True)
)
(5): ReLU()
(6): ResidualLinear(
(linear): Linear(in_features=256, out_features=256, bias=True)
)
(7): ReLU()
(8): MyLinear(in_features=256, out_features=1, bias=True)
)
)
)
)
(INFO) (reconstruct.py) (11-Sep-24 23:05:39) 2033641 parameters in model
(INFO) (reconstruct.py) (11-Sep-24 23:05:39) Model initialized. Moving to GPU...
(INFO) (reconstruct.py) (11-Sep-24 23:05:42) --- Training Starts Now ---
(INFO) (reconstruct.py) (11-Sep-24 23:05:42) Will pretrain on 10000 particles
(INFO) (reconstruct.py) (11-Sep-24 23:05:42) Will make a full summary at the end of this epoch
(INFO) (reconstruct.py) (11-Sep-24 23:05:57) # [Train Epoch: -1/103] [10112/144702 particles]
(INFO) (reconstruct.py) (11-Sep-24 23:05:58) # =====> SGD Epoch: -1 finished in 0:00:15.502928; total loss = 144509901.308861
(INFO) (analysis.py) (11-Sep-24 23:05:59) Explained variance ratio:
(INFO) (analysis.py) (11-Sep-24 23:05:59) [0.31302254 0.27805947 0.24219864 0.16671935]
(INFO) (reconstruct.py) (11-Sep-24 23:06:00) Will use pose search on 144702 particles
(INFO) (reconstruct.py) (11-Sep-24 23:06:00) Will make a full summary at the end of this epoch
Thank you very much for your assistance!
Hi, can you try these things:
1) Double-checking the spelling of "True" in your configs.yaml
, which has a typo in the message above.
2) Rerunning with CUDA_LAUNCH_BLOCK=1
as discussed here, e.g. export CUDA_LAUNCH_BLOCK=1; drgnai train drgnai_test1
, which will help make the error messages more verbose.
3) Checking the version of the GPU drivers you have installed, e.g. using nvidia-smi
, as this will help figure out if this is indeed a problem with the software environment!
-Mike
Hi,
I am trying to run the drgnai train, but failure with some errors. I am not sure if there is something wrong with my input file. I use particles.star from relion-particles-extraction. The file and the error massage are as follows:
error massage:
Thank you very much for your assistance.