Closed FlorianBeckOle closed 2 days ago
Hi Florian,
Don't worry about the path, I'm not sure on the exact details but often the paths you see in errors are the paths to files at build time rather than runtime.
There should be some logs inside your m folder too, do they say anything useful?
I assume you tried running a number of times, do inputs look normal otherwise?
I ran this using the latest conda build yesterday without issue, what GPU are you running on?
Hi,
I tried
Quadro RTX 5000 and
NVIDIA A40
best
Florian
Von: alisterburt @.***> Gesendet: Dienstag, 9. Juli 2024 15:06:46 An: warpem/warp Cc: Beck, Florian; Author Betreff: Re: [warpem/warp] MTools create_species error (Issue #177)
I ran this using the latest conda build yesterday without issue, what GPU are you running on?
— Reply to this email directly, view it on GitHubhttps://github.com/warpem/warp/issues/177#issuecomment-2217695219, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APAUYRK2543ORG2JISJKEWLZLPN6NAVCNFSM6AAAAABKSYTK32VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJXGY4TKMRRHE. You are receiving this because you authored the thread.Message ID: @.***>
My Install:
MTools --version MTools 2.0.0+952054dad7ef651712bb325b0d8e2702aceaf811
Von: alisterburt @.***> Gesendet: Dienstag, 9. Juli 2024 15:06:46 An: warpem/warp Cc: Beck, Florian; Author Betreff: Re: [warpem/warp] MTools create_species error (Issue #177)
I ran this using the latest conda build yesterday without issue, what GPU are you running on?
— Reply to this email directly, view it on GitHubhttps://github.com/warpem/warp/issues/177#issuecomment-2217695219, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APAUYRK2543ORG2JISJKEWLZLPN6NAVCNFSM6AAAAABKSYTK32VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJXGY4TKMRRHE. You are receiving this because you authored the thread.Message ID: @.***>
There should be some logs inside your m folder too, do they say anything useful?
I assume you tried running a number of times, do inputs look normal otherwise?
Hi,
sorry did not find any logs:
ls -lrta m/species/apoferritin_3c57475c/ total 2 drwxr-xr-x 2 fbeck b_cryo-em_tech 4096 Jul 9 14:17 . drwxr-xr-x 8 fbeck b_cryo-em_tech 4096 Jul 9 14:26 ..
ls -lrta m/species/ total 8 drwxr-xr-x 3 fbeck b_cryo-em_tech 4096 Jul 9 14:01 .. drwxr-xr-x 2 fbeck b_cryo-em_tech 4096 Jul 9 14:01 apoferritin_b86952e9 drwxr-xr-x 2 fbeck b_cryo-em_tech 4096 Jul 9 14:04 apoferritin_fa9c7148 drwxr-xr-x 2 fbeck b_cryo-em_tech 4096 Jul 9 14:07 apoferritin_8c9b9e3f drwxr-xr-x 2 fbeck b_cryo-em_tech 4096 Jul 9 14:11 apoferritin_4b240dba drwxr-xr-x 2 fbeck b_cryo-em_tech 4096 Jul 9 14:17 apoferritin_3c57475c drwxr-xr-x 8 fbeck b_cryo-em_tech 4096 Jul 9 14:26 . drwxr-xr-x 2 fbeck b_cryo-em_tech 4096 Jul 9 14:26 apoferritin_dfd6a877
best
Florian
Von: alisterburt @.***> Gesendet: Dienstag, 9. Juli 2024 15:15:33 An: warpem/warp Cc: Beck, Florian; Author Betreff: Re: [warpem/warp] MTools create_species error (Issue #177)
There should be some logs inside your m folder too, do they say anything useful?
I assume you tried running a number of times, do inputs look normal otherwise?
— Reply to this email directly, view it on GitHubhttps://github.com/warpem/warp/issues/177#issuecomment-2217716278, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APAUYRNDSQRKEY2CZ3PAQQDZLPO7LAVCNFSM6AAAAABKSYTK32VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJXG4YTMMRXHA. You are receiving this because you authored the thread.Message ID: @.***>
Those logs are there somewhere, without them I don't have enough info to help you debug
Hi,
is there any verbose flag I can set ?
ls -R M | grep log
ls -R M M: 10491.population mask_4apx.mrc species
M/species: apoferritin_3c57475c apoferritin_4b240dba apoferritin_8c9b9e3f apoferritin_b86952e9 apoferritin_dfd6a877 apoferritin_fa9c7148
M/species/apoferritin_3c57475c:
M/species/apoferritin_4b240dba:
M/species/apoferritin_8c9b9e3f:
M/species/apoferritin_b86952e9:
M/species/apoferritin_dfd6a877:
M/species/apoferritin_fa9c7148:
ls -R warp_tiltseries | grep log logs warp_tiltseries/logs: TS_11.log TS_17.log TS_1.log TS_23.log TS_32.log
Von: alisterburt @.***> Gesendet: Dienstag, 9. Juli 2024 15:28:09 An: warpem/warp Cc: Beck, Florian; Author Betreff: Re: [warpem/warp] MTools create_species error (Issue #177)
Those logs are there somewhere, without them I don't have enough info to help you debug
— Reply to this email directly, view it on GitHubhttps://github.com/warpem/warp/issues/177#issuecomment-2217746115, or unsubscribehttps://github.com/notifications/unsubscribe-auth/APAUYRPCMG5CLK2OWTPXAGDZLPQOTAVCNFSM6AAAAABKSYTK32VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJXG42DMMJRGU. You are receiving this because you authored the thread.Message ID: @.***>
setting WARP_DEBUG=1
will add some debug output - I don't know whether there is any debug output for species creation
I can also add: I am having the same error on our conda and optimized modules. However, sometimes I also get an extra error earlier with the nvfuser library (in this case using the dev19 conda module):
Reading maps... Done
--angpix not specified, using 5.0000 A/px from half-map.
Resampling maps to 2.0000 A/px... Done
Padding or cropping half-maps to 2x molecule diameter... Done
Padding or cropping mask to 2x molecule diameter... Done
Processing half-maps... Done
Parsing particle table... Done
Calculating resolution and training denoiser model...
4/5: Training denoising[W interface.cpp:47] Warning: Loading nvfuser library failed with: Error in dlopen: /g/easybuild/x86_64/Rocky/8/rome/software/PyTorch/2.0.1-foss-2022a-CUDA-11.8.0/lib/python3.10/site-packages/torch/lib/libnvfuser_codegen.so: undefined symbol: _ZN3c106ivalue14ConstantString6createENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE (function LoadingNvfuserLibrary)
4/5: Training denoising: Preparing mask... done.
Preparing data:
4/5: Training denoising: Preparing map 0... Adjusting the number of iterations to 1500 to match batch size and number of maps.
4/5: Training denoising: 0/1500Unhandled exception. System.Exception: The loss function has reached an invalid value because something went wrong during training.
at Warp.NoiseNet3DTorch.TrainOnVolumes(NoiseNet3DTorch network, Image[] halves1, Image[] halves2, Image[] masks, Single angpix, Single lowpass, Single upsample, Boolean dontFlatten, Boolean performTraining, Int32 niterations, Single startFrom, Int32 batchsize, Int32 gpuprocess, Action`1 progressCallback) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1720472789965/work/WarpLib/NNModels/NoiseNet3DTorch.cs:line 819
at Warp.Sociology.Species.CalculateResolutionAndFilter(Single fixedResolution, Action`1 progressCallback, Int32 gpuID) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1720472789965/work/WarpLib/Sociology/Species.cs:line 1650
at MTools.Commands.CreateSpecies.Run(Object options) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1720472789965/work/MTools/Commands/CreateSpecies.cs:line 582
at MTools.MTools.Run(Object options) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1720472789965/work/MTools/MTools.cs:line 32
at CommandLine.ParserResultExtensions.WithParsed[T](ParserResult`1 result, Action`1 action)
at MTools.MTools.Main(String[] args) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1720472789965/work/MTools/MTools.cs:line 21
Aborted (core dumped)
This is on dev19. If I run on dev14, there is no issue (with the exact same command and parameters). I think the issue appeared when the multi-species problem was fixed.
huh, thanks for the confirmation @jmdobbs - I'll have to see if anything changed between then and now
@jmdobbs I can't reproduce and can't see any changes to the noisenet models themselves https://github.com/warpem/warp/commits/main/WarpLib/NNModels/NoiseNet3DTorch.cs
I haven't checked more deeply, maybe something called from there changed... without a reproducible example I can't debug. If you could find between which releases it broke that would narrow down the range of changes significantly
@alisterburt I can confirm that, on our system, dev15 works and dev17 does not. We don't have dev16 so I can't nail it down exactly. This issue is 100% consistent for us as far as I know.
The exact command I used is below, but I think in all cases we've tried to create species (me and others) this has come up:
MTools create_species -p m_testing/test.population -n testing -d 300 --angpix_resample 2 --lowpass 15 --half1 /struct/mahamid/jdobbs/path/run_half1_class001_unfil.mrc --half2 /struct/mahamid/jdobbs/path/run_half2_class001_unfil.mrc --mask /struct/mahamid/jdobbs/path/mask.mrc --particles_relion /struct/mahamid/jdobbs/path/run_data.star
Thanks @jmdobbs
dev 15 is fc90124 dev 17 is c50b893
The commit between those two which I suspect is causing the issue is be93c32 which was a confirmed fix for #156
@jmdobbs is it correct to say that this commit
Yes, that definitely matches with what we're observed. E.g. two days ago I ran multi-species successfully on dev20 (though quite often it fails due to the issue in #179) using species I created with dev14 because species creation was not working on dev20.
You're a machine @jmdobbs - this is incredibly useful
should be closed by https://github.com/warpem/warp/commit/f855473fe83aeffb58bed2bca8c4b4eb29ee474b
assuming fixed, please reopen if necessary 🙂
Hi
I followed the tutorial until create species. The command below gives the following error. One thing I noticed it the path /usr/share/miniconda/envs/package-build/conda-bld/warp_1720036028036/work/MTools/MTools.cs which is differs from my installation: which MTools /fs/pool/pool-bmapps/hpcl8/app/soft/WARP/2.0.0dev18/conda3/envs/warp/bin/MTools
Am I doing something wrong ?
thanks
Florian
testWarp:>MTools create_species --population m/10491.population --name apoferritin --diameter 130 --sym O --temporal_samples 1 --half1 relion/Refine3D/job002/run_half1_class001_unfil.mrc --half2 relion/Refine3D/job006/run_half2_class001_unfil.mrc --mask m/mask_4apx.mrc --particles_relion relion/Refine3D/job002/run_data.star --angpix_resample 0.7894 --lowpass 10 Running command create_species with: population = m/10491.population name = apoferritin diameter = 130 sym = O temporal_samples = 1 half1 = relion/Refine3D/job002/run_half1_class001_unfil.mrc half2 = relion/Refine3D/job006/run_half2_class001_unfil.mrc mask = m/mask_4apx.mrc angpix = angpix_resample = 0.7894 lowpass = 10 particles_relion = relion/Refine3D/job002/run_data.star particles_m = angpix_coords = angpix_shifts = ignore_unmatched = False
Reading maps... Done --angpix not specified, using 4.0000 A/px from half-map. Resampling maps to 0.7894 A/px... Done Padding or cropping half-maps to 2x molecule diameter... Done Padding or cropping mask to 2x molecule diameter... Done Processing half-maps... Done Parsing particle table... Done Calculating resolution and training denoiser model... 4/5: Training denoising: Preparing mask... done.
Preparing data: 4/5: Training denoising: Preparing map 0... Adjusting the number of iterations to 1500 to match batch size and number of maps.
4/5: Training denoising: 0/1500Unhandled exception. System.Exception: The loss function has reached an invalid value because something went wrong during training.
at Warp.NoiseNet3DTorch.TrainOnVolumes(NoiseNet3DTorch network, Image[] halves1, Image[] halves2, Image[] masks, Single angpix, Single lowpass, Single upsample, Boolean dontFlatten, Boolean performTraining, Int32 niterations, Single startFrom, Int32 batchsize, Int32 gpuprocess, Action
1 progressCallback) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1720036028036/work/WarpLib/NNModels/NoiseNet3DTorch.cs:line 819 at Warp.Sociology.Species.CalculateResolutionAndFilter(Single fixedResolution, Action
1 progressCallback, Int32 gpuID) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1720036028036/work/WarpLib/Sociology/Species.cs:line 1650 at MTools.Commands.CreateSpecies.Run(Object options) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1720036028036/work/MTools/Commands/CreateSpecies.cs:line 582 at MTools.MTools.Run(Object options) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1720036028036/work/MTools/MTools.cs:line 32 at CommandLine.ParserResultExtensions.WithParsed[T](ParserResult1 result, Action
1 action) at MTools.MTools.Main(String[] args) in /usr/share/miniconda/envs/package-build/conda-bld/warp_1720036028036/work/MTools/MTools.cs:line 21