iqbal-lab-org / make_prg

Code to create a PRG from a Multiple Sequence Alignment file
Other
21 stars 7 forks source link

make_prg update PR series: 3. CLI changes #35

Closed leoisl closed 2 years ago

leoisl commented 2 years ago

Hello, this is the 3rd PR, only containing CLI changes. It is the simplest and shortest one to review. My original idea was to leave this one as the last PR (we still have the longest and most complicated one, that contains the whole update code and the explicit representation of the recursion tree), but given that most people have just 2 working days remaining before a long break, I think it is better to just submit this small PR now, and the big one when we come back from the break.

This PR contains just CLI changes for the make_prg from_msa command, and the addition of make_prg update command.

Main changes, to help reviewing:

The only tests covering the code in this PR are integration tests, but they are not part of this PR because there are some integration tests that are related to make_prg update command, so will choose to PR these tests later, after the update code is reviewed.

Thank you for your help and time!!

leoisl commented 2 years ago

Thanks for all the issues raised @mbhall88 ! I requested a re-review, but please do it at your best convenience, I hope you are enjoying a long break post viva, and will see this just way later! CLI looks much better now, and I have to admit it was the module that I invested the least amount of time and effort, as I was focused mostly on the core update algorithm. But I think most issues were caught by your first review, and it looks much better now, thanks a lot!

leoisl commented 2 years ago

Hello,

I've read all your comments and suggestions, and will implement them all, I think they are all correct and worth implementing in this PR. I started with the longest one, which is the refactoring of how we represent the set of output files. This was just done here: https://github.com/iqbal-lab-org/make_prg/pull/35/commits/0d7870a976a97514ee29b1739e06f76bde57766a . Follows some answers and comments about this refactoring to help reviewing:

I don't have a clear solution. But could we do something like, have the SetOutputFiles object have a constructor that takes the parsed --output-type string and sets non-None values to its attributes accordingly. Then we never use OutputType object.

Basically this was done, but I still kept the OutputType class. Is this fine or should we necessarily remove it?

Add locus_name attribute to SetOutputFiles. Make one SetOutputFiles object per locus to build/update. Each threaded call to from_msa/update uses this object instead- it will know what files to produce.

This was done as said.

Have a function that takes a list of SetOutputFiles and extracts all files. Zip/concatenate from those.

This is done in the create_final_files method

I have a clear feeling that there is a lot of code redundancy and that something like this approach can reduce it considerably.

Yeah, some code duplication were removed, and two functions turned out to be not needed and thus removed.

Could I have a review on this refactoring please? Will proceed to the other minor comments, but will be able to work on them just on monday.

Thanks a lot for all your help and work, this code is getting much better with your reviews.

bricoletc commented 2 years ago

This looks much better to me @leoisl - i like that InputOutputFiles' behaviour is now directly coupled to OutputType specification. I think it's fine to keep OutputType class. Nice work!!

leoisl commented 2 years ago

Hi all again! Thanks for all the comments and corrections. The CLI looks much, much better now, it would not get to this state without your help. I think I addressed all your comments, suggestions and corrections. I am requesting a re-review from @bricoletc to check if all is fine, and will merge once I have his approval. Cheers

bricoletc commented 2 years ago

Looks fine, though I want to point out these comments:

I understood import refactorings will happen later- will these comments too? It's a bit weird to have loguru used but not imported as a dependency in this code, but i suppose it's ok to delay adding it to setup.py

bricoletc commented 2 years ago

I approved the changes but please look at latest comment above before merging- I leave it to you to push those comments and merge, or merge and push those comments later ;)

leoisl commented 2 years ago

Looks fine, though I want to point out these comments:

  • Reformat the codebase with black
  • Add loguru to setup.py dependencies

I understood import refactorings will happen later- will these comments too? It's a bit weird to have loguru used but not imported as a dependency in this code, but i suppose it's ok to delay adding it to setup.py

Yes, the last PR that will deal with a lot of misc supporting tasks like: