PR generalizability - Githubissues

arajesh17 commented 2 years ago

Hi - I have been working to implement your code on our servers.

It works well and we want to deploy it on some of our data. I switched the input and results files to strings that are stored in automorph/config.py . I also changed some of the parameters such as batchsize and workers so that it can altered in one location.

I also added the setup.py so that I can install the automorph/ as a package and allow for relative imports.

I am working on deploying it in AWS so that's what some of the lambda functions are. I am creating a new docker container so that I can run it in AWS.

rmaphoh commented 2 years ago

Thanks for your PR.

I cannot see the commits' detail as the files changed are out of Github limitation (might be due to folders change).

I have run your branch code and tried to find the revised parts according to your description. I have tested it on Google cloud platform and local machine.

It's helpful to put the path and hyperparameters in automorph/config.py. One minor issue is that I need to manually add the path by

export PYTHONPATH=<path/to/AutoMorph>:$PYTHONPATH

to avoid the error

cannot find module automorph

For the point about setup.py, it fails when I tried pip install automorph. Did I understand it wrongly?

ERROR: Could not find a version that satisfies the requirement automorph (from versions: none) ERROR: No matching distribution found for automorph

For the feasibility on AWS, can you help organise the related files in a folder and draft an instruction for use?
Are the files of lee_lab_scripts required in the running procedure?

Cheers

arajesh17 commented 2 years ago

Hi Yukun!

I'm glad you were able to check and see what I changed.

So the way I installed it as a package into my anaconda environment was by going into the AutoMorph_Lee directory, then entering the command:

pip install -e .

this reads the setup.py and installs the automorph package into your python library. Here is link describing my thinking behind making a package as well as explains some more fancy things that could be done.

The lee_lab_scripts are not required for running - they are used for me to organize our data. Those should not be included in the public code.

I will make a PR once I put it in AWS and can put instructions for use.

rmaphoh commented 2 years ago

Thanks for your explanation. Now I see the function of pip install -e ..

I have tested the feasibility of integrating pip install -e . and pip install -r requirement.txt.

I need to thoroughly check the compatibility with Google Colab and some readme instructions might need to be revised since the folder path has changed.

Please PR when you finish the changes. I might further change the hyperparameters. Cheers.

lajos-cs commented 2 years ago

Thank you very much for sharing these codes. Very helpful. I am running it on Google Colab and it works OK on 45 degree fundus photographs.

I know you would not recommend using it on ultra-wide field images (Optos), but I would love to see how it performance on them. Would the input be the RGB image or channels need to be split and green fed in?

Also, in terms of the output csvs, I noticed that Disc_Zone_B_Measurement.csv has exact same values as Disc_Zone_C_Measurement.csv in 'Results\M3\Disc_centred\' directory. Can I fix that somehow?

arajesh17 commented 2 years ago

I re-did the structure of the code with the new commits:

I went away from the shell script model of execution and instead packaged everything into a python file with a config.py file that is used to pass arguments. I did this because when running serverless on AWS the syntax recommended running everything through a python method as opposed to running everything on a shell script. Honestly a shell script may have worked but I didn't try it. I was stuck on AWS because of the relative imports and the absolute paths that were hardcoded in the code originally with the shell script which were causing the script to fail. Originally I tried in the PR with setup.py but I realized that was a horrible idea. Instead I just removed setup.py and stuck with relative imports. It executes now in my directory without issue.

When changing things, I also ran into another problem. The code was creating a lot of intermediate files which weren't necessary and were filling up space. Specifically the the stages of M0 and M1 - the cropped and good quality images were just copied from the previous folders. Instead, I changed to file structure to include a csv called "M1/Good_quality/image_list.csv" which can be used to pass images to the dataloader in the M2 steps. The image_list.csv has a column called Name with the file path of the original image, as well as the crops. The subsequent dataloader then loads the image and crops it. This allows for less storage space. In an ideal world, the dataloaders would be shared across all models so that this doesn't have to occur each time. If you want to, you could optimize this so that there is less repetition.

Finally, the M3 module is very repetitive. If you want to clean things up, you should clean up the [M3/../../create_datasets_disc_centred_B.py, create_datasets_disc_centred_C.py, create_datasets_macular_centred_B.py, create_datasets_macular_centred_B.py] files and just make them one file. It is really difficult to debug and read them since they are all pretty much the same except for minor changes in strings for the directories. Also it doesn't make sense that you have the /retipy/ directory in two different locations in the code: in M3/_feature_zone/ and M3_feature_whole_pic/

Finally - it would be great to make the code be able to run on individual images not just image directories. this would be more useful for something like a lambda function on AWS which automatically runs Automorph pipeline whenever a new file is uploaded.

rmaphoh / AutoMorph

PR generalizability #1

Thanks for your PR.