Implement option to perform prediction outside of the PRIDICT2 repo

allydunham commented 10 months ago

The script doesn't run outside of the repo directory and the arguments listed on Github seem to be incorrect.

The script inconsistently references the current working directory and the file location throughout so only works correctly when it is run with the repo as working directory. The description of the arguments in README also seems to be incorrect - the input and output directories are hardcoded and using the associated arguments just gives an error.

This is all fairly straightforward to fix and there doesn't seem to be anything that actually requires the script to be run in the PRIDICT2/ repo dir - it seems to be working in any directory now for me. For reference, the steps I took were:

Remove inp_dir and just use args.input_fname to find the input table. inp_dir doesn't get seem to used other than to find the CSV file, which can just be passed as a path
Added the described --output-dir argument and use that for out_dir
Change load_pridict_model to use repo_dir = os.path.dirname(os.path.realpath(__file__))

I imagine you might be able to make a cleaner setup with more knowledge of the code though.

mathinic commented 10 months ago

Hi allydunham

Thanks for the feedback! That the prediction only works within the repository is actually intended behavior, so that by default users do not need to give inp_dir and out_dir because it uses the existing folder structure. If we make it work from anywhere, we need to make inp_dir/out_dir a requirement (not optional) anymore which seems to complicate the process for novice users.

In the readme we therefore also guide users to move into the repository (# navigate into repository) when attempting to run PRIDICT2.0.

But I'll keep the Issue open for now and will include a solution in the next update iteration.

Best wishes, Nicolas

allydunham commented 10 months ago

Hi Nicolas,

No worries, thanks for making the great tool!

I interpreted navigating into the respository as part of the install process rather than needed at runtime and missed the comment in the running instructions - might be worth highlighting it more as I can see a lot of people making that mistake.

That makes sense, I can see having the script in a different directory might be confusing sometimes. The current behaviour is quite confusing if you do try to use it outside the repo though. It certainly confused me until I looked at the code and the errors it was giving. I think you should be able to get the best of both worlds by setting the defaults to base off the CWD without hardcoding __file__. E.g. if out_dir defaults to ./predictions it will work in the repo as described but also elsewhere. Similarly, either dropping inp_dir in favour of a path input or keeping it and defaulting to ./input will do the same.

As an aside, I think the easiest experiance for new users would be packaging everything up into a python package that exposes a pridict2 command. That way users get pridict2 command after install that they have access to wherever and you don't need to worry about the path to the script and in/out directories as much. It has the added benefit of advanced users being able to programmatically use your models more easily. I've found this works very nicely for a couple of tools I've made, although it takes a little bit more effort setting things up initially.

Best wishes, Ally

mathinic commented 3 months ago

Hi Ally, Sorry for the late reply, and thanks again for your input & suggestions! I assume in the meantime you solved the problem in your scripts, but wanted to let you know that now you can also use PRIDICT2.0 by default from outside the repo directory. (https://github.com/uzh-dqbm-cmi/PRIDICT2/commit/19aa95201916c345c7322b1f404da7546dfaeb76)

Hope that solves the issue that you initially faced.

I agree that creating a package would be an even more convenient solution. However, due to the complex dependencies of the repo (e.g. it needs old tensorflow to run DeepSpCas9 from Kim et al. 2019), it is very sensitive to any changes in the environment. Creating a package will encourage users to install it in their existing environments, which would frequently fail.

Best, Nicolas

allydunham commented 3 months ago

Thanks, looks great! Yes, we made a similar patch to run our scripts but much better having it in the main repo. That makes sense about installing the environment, these projects do often have awkward requirements.

Thanks again, Ally

uzh-dqbm-cmi / PRIDICT2

Implement option to perform prediction outside of the PRIDICT2 repo #1