pepkit / geofetch

Builds a PEP from SRA or GEO accessions
https://pep.databio.org/geofetch/
BSD 2-Clause "Simplified" License
45 stars 5 forks source link

Naming and docstring refactoring in geofetch #94

Closed khoroshevskyi closed 1 year ago

khoroshevskyi commented 1 year ago

This is currently a primarily single 2000+line file. Can you divide it into at least 1 or 2 modules? Here are some suggestions:

  1. To start, just removing all the command-line processing functionality into cli.py would help make the code more approachable.

  2. You have several static methods on the class that don't take self in as an argument...such as _get_list_of_keys, and others. Would it make more sense to move these into a utilities.py file, or something, rather than being attached to the Geofetcher class? This would help simplify that class, which is right now a bit complex and large. The same is true for the Finder class, with functions like uid_to_gse. This is just a utility function and I don't think the best place for it is as a class method. Correct me if I'm wrong and you think this is a better way to write this for some reason.


A few other clarity/polishing things:

  1. Can you rename def _dict_to_list_convector and fix the docstring? I am not sure what 'convector' is.
  2. Can you rename these to be more descriptive: _sra_bam_conversion2, _sra_bam_conversion. Are they both used and needed? If so fine, but name them something other than 1 and 2.
  3. In general, I am finding the docstrings confusing and not able to understand what the functions do. Can you go through the docstrings one more time for the functions and try to think a bit more carefully about how you can convey what these functions are doing? For example, one specific thing you can do is use a more delcarative style. Instead of "Saving list of gse numbers to the file", you write what the function actually does, because the progressive form of the verb makes it confusing. Just say what it does: "Save list of GSE numbers to a file". Even better would be to be more descriptive. In this case it would be "Save the list of GSE accessions stored in this Finder object to a given file".

Originally posted by @nsheff in https://github.com/pepkit/geofetch/pull/93#pullrequestreview-1153156655

khoroshevskyi commented 1 year ago
  1. I agree with an idea about dividing geofetch in few modules. Now I divided geofetch into 3 modules: cli, geofetch(main) and utils. I would say, the way I did it now is not the output (solution). The way I would do it: divide all functions by category and make them class based (OOP). To to it, code needs new refactoring and more time. I would say, it doesn't worth it to do it now.
  2. For Finder, I will keep all static methods in that class, because they are usfull for that class and users can use them outside Finder class.
    1. _dict_to_list_convector renamed!
    1. renamed
    1. I beleve I checked all docstrings.
      • Fixed looper amendments
      • Added comments to config file