Closed leoisl closed 2 years ago
Thanks for all the issues raised @mbhall88 ! I requested a re-review, but please do it at your best convenience, I hope you are enjoying a long break post viva, and will see this just way later! CLI looks much better now, and I have to admit it was the module that I invested the least amount of time and effort, as I was focused mostly on the core update algorithm. But I think most issues were caught by your first review, and it looks much better now, thanks a lot!
Hello,
I've read all your comments and suggestions, and will implement them all, I think they are all correct and worth implementing in this PR. I started with the longest one, which is the refactoring of how we represent the set of output files. This was just done here: https://github.com/iqbal-lab-org/make_prg/pull/35/commits/0d7870a976a97514ee29b1739e06f76bde57766a . Follows some answers and comments about this refactoring to help reviewing:
make_prg/utils/input_output_files.py
, containing the base abstract class InputOutputFiles
, which represents the set of input files consumed and output files created for each make_prg
subcommand. This abstract class groups several common methods together, and some functions from make_prg/utils/io_utils.py
that help the implementation of this class were moved to this new source file. We also have a concrete child class InputOutputFilesFromMSA
for the make_prg from_msa
command and the child class InputOutputFilesUpdate
for the make_prg update
command;make_prg
command to deal with all the logic of input/output files, with the common stuff abstracted into the base class, made the source files that before managed part of this, make_prg/subcommands/from_msa.py
and make_prg/subcommands/update.py
much simpler, as well as reducing a lot the size of make_prg/utils/io_utils.py
, which now contains only very general IO util functions;I don't have a clear solution. But could we do something like, have the
SetOutputFiles
object have a constructor that takes the parsed--output-type
string and sets non-None
values to its attributes accordingly. Then we never useOutputType
object.
Basically this was done, but I still kept the OutputType
class. Is this fine or should we necessarily remove it?
Add locus_name attribute to
SetOutputFiles
. Make oneSetOutputFiles
object per locus to build/update. Each threaded call to from_msa/update uses this object instead- it will know what files to produce.
This was done as said.
Have a function that takes a list of
SetOutputFiles
and extracts all files. Zip/concatenate from those.
This is done in the create_final_files
method
I have a clear feeling that there is a lot of code redundancy and that something like this approach can reduce it considerably.
Yeah, some code duplication were removed, and two functions turned out to be not needed and thus removed.
Could I have a review on this refactoring please? Will proceed to the other minor comments, but will be able to work on them just on monday.
Thanks a lot for all your help and work, this code is getting much better with your reviews.
This looks much better to me @leoisl - i like that InputOutputFiles
' behaviour is now directly coupled to OutputType
specification. I think it's fine to keep OutputType
class. Nice work!!
Hi all again! Thanks for all the comments and corrections. The CLI looks much, much better now, it would not get to this state without your help. I think I addressed all your comments, suggestions and corrections. I am requesting a re-review from @bricoletc to check if all is fine, and will merge once I have his approval. Cheers
Looks fine, though I want to point out these comments:
I understood import refactorings will happen later- will these comments too? It's a bit weird to have loguru used but not imported as a dependency in this code, but i suppose it's ok to delay adding it to setup.py
I approved the changes but please look at latest comment above before merging- I leave it to you to push those comments and merge, or merge and push those comments later ;)
Looks fine, though I want to point out these comments:
- Reformat the codebase with black
- Add loguru to setup.py dependencies
I understood import refactorings will happen later- will these comments too? It's a bit weird to have loguru used but not imported as a dependency in this code, but i suppose it's ok to delay adding it to setup.py
Yes, the last PR that will deal with a lot of misc supporting tasks like:
Hello, this is the 3rd PR, only containing CLI changes. It is the simplest and shortest one to review. My original idea was to leave this one as the last PR (we still have the longest and most complicated one, that contains the whole update code and the explicit representation of the recursion tree), but given that most people have just 2 working days remaining before a long break, I think it is better to just submit this small PR now, and the big one when we come back from the break.
This PR contains just CLI changes for the
make_prg from_msa
command, and the addition ofmake_prg update
command.Main changes, to help reviewing:
-v
for debug,-vv
for trace (this one is only for devs));--outdir
by--output_prefix
;--prg_name
param removed;--seqid
param removed;--no_overwrite
param removed;--output-type
param removed - we now always build the three types of output files (fasta, GFA, and binary). I am not sure about this one, we can add this param back if you prefer;make_prg update
subcommand added;The only tests covering the code in this PR are integration tests, but they are not part of this PR because there are some integration tests that are related to
make_prg update
command, so will choose to PR these tests later, after the update code is reviewed.Thank you for your help and time!!