Closed RyanZurrin closed 3 months ago
Tashrif has committed in Ryan's master branch directly to resolve the above comments.
Whitespaces apart, the above are the two substantial blocks I could find in this PR. Are there any other blocks I should review? @RyanZurrin
The only other parts are in the beginning, where it checks what TF version and then imports based on version which is part of the dynamic allocation part, so you have checked everything important.
Ryan's latest env_build_commands.md
did not use GPU either in pnl-predict machine.
It used GPU when I was testing it. Maybe I can stop by, and we can go through the steps together.
As I mentioned already when using the pipeline_tests.sh it requires you to have fsl env activated which from my experience was using that env instead of the clean python env I built.
I did install from a clean python env and my bashrc was removed.
I think the old pipeline_tests.sh requires dependencies that are not needed for the CNN Masking; maybe we can make a cleaner pipeline_tests.sh that does not require so many unneeded dependencies.
pipeline_test.sh does not use a default environment. You, the user, need to source dcm2niix, ANTs, FSL before it can run. So pipeline_test.sh is not the issue.
yes when I sourced the FSL, it would take precedence even over my already activated conda env and that pipeline_test.sh would use the python from within the FSL and not the conda.
Thank you for the hint. I shall double check soon.
Tashrif's issue was he did not set the LD_LIBRARY_PATH or had a different CUDA-12 installation.
However, Tashrif and Ryan established that the new set of install instructions work on both CentOS 7 and Rocky9 machines.
As one last try, Tashrif will try to environmentalize the install instructions.
Tashrif is doing one final review of dwi_masking.py
before merging it.
Merging Ryan's work so I have better control at finalizing a few things.
I was able to fix this issue by setting TF to use Dynamic memory allocation instead of its default which is to allocate all the GPU memory. It usually will do this to prevent memory fragmentation. I have successfully ran two jobs in parallel on a GPU with only 11GB of memory, where before this was not possible.