robail-yasrab / RootNav-2.0

Plant Phenotyping APP
BSD 3-Clause "New" or "Revised" License
41 stars 18 forks source link

Install issues September 2023 #14

Closed DanHobleyADAS closed 6 months ago

DanHobleyADAS commented 1 year ago

Hi @mikepound! I'm transferring our email conversation to Github for future clarity. Here is the start of the email chain:

Our group has been trying to install RootNav-2.0 on PC

Both @ADAS-DaveSkirvin and I tried various different approaches to getting the package installed, and neither had any success. All of our issues were around versioning of the 3rd party packages. In particular, it was torchvision causing the most trouble, with (IIRC) conflict with ??tensorboard. I assume this has resulted from version divergence since the most recent RootNav release?

Dave (cc) tried fairly extensively with conda, and I tried extensively using pip natively (since unfortunately, because ADAS is a commercial entity, conda is a paid license for us, and we’ll need to work around this). I started to expect that this was a conda-forge or pytorch channel versioning issue specific to native pip, but as Dave was having the exact same issues in conda, I don’t think this is the root problem. We’re on PCs, if relevant.

I tried forcing the install versions to those in the requirements.txt with ~=, but this also didn’t work. Neither did trying to roll back further by hand, including rolling back pip itself. I also tried a Python 2 install with your alternate version, hoping that the lack of recent Py2 support might have prevented version divergence, but this also didn’t work.

Does this sound like a known issue to you? Is there some manual workaround we can try? I have fairly extensive experience in Python for large scale research software, so can hopefully answer technical questions you might have. [Happy to open an issue on Github if you’d like me to... and here we are]

DanHobleyADAS commented 1 year ago

You replied: Hi Charlotte and Dan,

Thanks for the extra info. I was under the impression that RN2 no longer required tensorboard, as it does sometimes prove a pain to install, and we wanted to try and reduce the requirements. It is definitely there in the training code though, so I guess I was incorrect! Are you looking to train new models with RN2, or simply run it?

It may be we need to increase the version of pytorch and torchvision we’re requiring. Back in 2020 when we were re-writing some of this, torch was fairly stable, but the continued development has started introducing some breaking changes, and support for older versions hasn’t been as good as I’d have liked. I’d like to make sure the repo is fairly easy to install though, for obvious reasons that if it isn’t, no one will use it!

DanHobleyADAS commented 1 year ago

To which: I think this makes sense? I recommended to @ADAS-DaveSkirvin that shifting torchvision into the -pip: section of the Requirements.yml file might unstick things, and he reported that this did allow the virtual environment to set up in conda per your instructions. However, I gather there was then a further issue in actually running the tool which he will have to step in to describe (hopefully he is seeing this thread...)

It's our colleague Charlotte (not on Github) who will eventually be using the tool in anger, but I believe she intends to simply run, particularly using v2 to batch process.

Edit: I spent some more quality time in pip outside conda earlier today in the hope that I could use Dave's pip freeze output to force the versioning, but this also did not work. In this case, I was getting further cascading versioning issues. I can chase out the details for you if helpful.

mikepound commented 1 year ago

Thanks for this Dan. I'll begin by attempting to install a new environment using pip and see how I get on. I have a colleague who I believe was using the tool more recently than I was, so I will also ask them. If @ADAS-DaveSkirvin has other details, please let me know.

DanHobleyADAS commented 1 year ago

Tagging Charlotte @C-A-White so she can see progress on this directly.

@mikepound any chance you've had time to take a look at this yet?

mikepound commented 1 year ago

Hi Dan,

Thanks to a colleague of mine we have made a little progress. I was reminded that he performed some refactoring and modernization of the code earlier this year. I thought we'd pushed this up to this github repo, but I think this got missed. This might explain some of your issues! I'm going to think about pushing this up after some more testing, but in the meantime maybe you would like to test it:

Code link

I just got this running (CPU not GPU) on my windows laptop using a standard python 3.6 environment, with the latest version of torch installed using pip, and a handful of other packages inc. imageio, kdtree and so on. See how you get on. There's at least one warning that wants fixing later, but it's more minor.

Mike

DanHobleyADAS commented 1 year ago

Aha! Yes! I successfully ran in a 3.10 virtual environment (I think...? No errors at least). Required pip install torch, imageio, kdtree and also a downgrade of numpy for some reason: pip install numpy==1.25. Then I got a clean run of run_rootnav.py with no errors or warnings.

mikepound commented 1 year ago

Great! One thing we did was totally remove CRF, because it barely affected the output, and was a huge dependency issue. If you're looking to run rootnav on new images using the existing trained models, this this should work. The only issue remaining is if the training code needs updating, this may not have been done. It will depend somewhat on your data, for example if you're using Arabidopsis plate images it might work, but if they're different, we might need some retraining.

DanHobleyADAS commented 1 year ago

I gather @C-A-White is hoping to retrain (which I gather you could in the earlier version...??) with cereal roots. I assume from a very quick code scan and what you just said that this might be not straightforward?

Hopefully she can add a bit more on exactly what she wants to do. I believe she has worked extensively with the old tool version, but wants to be able to batch process with the new? (I might have this wrong?)

mikepound commented 1 year ago

It may work correctly, but there may be minor refactoring to do. Now I've got a working environment for the inference code, I'll try running the training code and see what happens.

Asmlopt commented 10 months ago

I encountered the same problem when installing it on windows system, that is, torchvision error, Then I tried it based on the error, but it didn't work,see the link for details(https://github.com/robail-yasrab/RootNav-2.0/assets/153095587/690da81b-69f8-44f8-a8a5-0fe0dc3dc67d). so how did you finally solve it?

mikepound commented 7 months ago

Hi all, I've been very tied up teaching this term which has limited my ability to look into this. My first njob is to try and incorporate some of the improvements in the above linked code so everyone gets it. I also think we should try and update the versions of pytorch used a little as this may solve some people's errors. If anyone has any currently working versions any insights would be appreciated!

mikepound commented 7 months ago

Update, the new branch is here. I'll be testing this and hopefully also removing some of the small library errors and warnings that occur too.

mikepound commented 6 months ago

Right, I'm finished with all my changes, and I'm going to merge them back into the master branch. You can see the pull request #15. I've made quite a few code improvements, including proper logging and debugging messages to help find out what is going wrong. However I've mostly targeted installation and training issues.

To address the install issues, I've done a few things:

I recommend re-downloading the repo from scratch, as I have changed a lot and also done some editing of old commits. If anyone has any further installation problems, please feel free to open a new issue.