jazdrv / dnaTools

GNU General Public License v3.0
2 stars 4 forks source link

sort code needs code review #19

Closed jazdrv closed 6 years ago

jazdrv commented 6 years ago

concerns:

jazdrv commented 6 years ago
jeftreece commented 6 years ago

It's going to take me a long time before I can much comment on the sort algorithms. I only have a few observations so far

jeftreece commented 6 years ago

I have gotten further with this review but still have more to go. Here are some things so far.

Approach to getting this merged with master is murky. I have previously suggested a strategy, and I'll reiterate it here. In slack, Feb 20, in #coding-database, Zak asked about the approving pull requests. I answered. I thought that was clear now, if unfamiliar. "this process is new to me, I'm afraid. so silly question ... the idea here is that I just review the code changes in the browser? [do I approve it?]" "yes. I don't think we've really formalized any process (and that's OK with me at least for now) but the idea is changes get reviewed by someone before going into master and ideally any issues identified in the review get discussed/resolved before it gets delivered to master"

Because there are changes to files like lib.py and db.py versus latest on master, I don't think I'm really reviewing the correct thing. Until it's in a pull request or diff off of master latest, I can't be very precise with my review comments.

March 13, I suggested a strategy: "What I think needs to happen is Zak prepare incremental branches for delivery to master that he can do a pull request on and we can review properly."

I recommend breaking this up into incremental deliveries. It can be done as one large delivery, but I don't recommend that because it's quite a bit more difficult to address review comments and make sure the code is all caught up. It's also harder for the reviewer. This is just basic, not specific to this project/branch. There's more than one way to accomplish this, but and here's one possible workflow:

  1. pull from master, resolving any merge conflicts, so there's no regression from recent fixes on master and you're as current as possible
  2. create a new branch
  3. copy limited-scope changes into the new branch, starting perhaps with some low-level changes
  4. do a pull request in master for the new branch; requesting review of the smaller change
  5. handle review comments in the pull request
  6. repeat until master has all of the necessary changes

So far, here are some review comments:

  1. redux.py may not be the place to put all of these command arguments. Refer to earlier discussion with Harald, where I thought we all agreed to have few command-line args to the main driver and standalone utilities as needed.
  2. dynamic sql binding must be used for queries. Otherwise, we're exposed on sql injection attacks. We may not care now since it's all single user sqlite, but this could easily come back to bite us as a security vulnerability when we move to mysql and maybe have a server somewhere with data on it. Must fix things like select ... where snpname = '%s' ..., using the proper construct db.execute('select ... where snpname=? ...', (var,))
  3. environment variable needed at runtime needs to be fixed. Refer to prior slack discussion where you agreed (sort of "It's doable") to one env var, pointing to the source directory. In the source dir, you can ". ./redux.env" to set this
  4. db1.py probably shouldn't exist. If you really need a different interface, I'd recommend sub-classing DB or else fixing whatever's needed in db.py (make sure to fix any place that is already using DB apis). Keep in mind - sqlite is not the end goal, so keeping this simple helps future portability to other DBM
  5. preference to use the Trace class rather than print statements. Trace is not the end goal, likely logging is. See Issue #36 which may not be done immediately, but having a consistent logging call across the code will make the job easier in the future.
  6. One thing I'm still working out is the extra columns added to vcfcalls. I'd much rather fix get_array_api than change the underlying low level storage. I know you said you don't understand this routine, but it's a very simple routine that walks through calls and returns values in a python data structure. These are not difficult concepts.

I'll probably have more comments later. Resolving these in an issue is not the way I'd prefer. I'd prefer to resolve review comments via a pull request.

moregubbins commented 6 years ago

Hi Jef, Zak, I think most of Jef's comments are already acknowledged, particularly #1, #3, #4 and #5, particularly since Zak has been working on this as a "proto-type", so the integration and quality of the code presumably isn't up to as high a quality as would otherwise be the case. Nevertheless, the outputs do show that this is roughly the logic principles that we need. I'd prefer to see what Zak's done as a measure that lets us prove what we've got there is viable, but I think we all know it will take some migration to get it properly integrated. My question you both is whether you think you can do this together, let's say over the next two weeks, with minimal input from me? Zak: are you able to create one or more intermediate branch points as Jef suggests? Jef: are you prepared to take charge of organising and performing this merger? I'll address Zak's e-mail separately, because it deserves a detailed response. Cheers, Iain.

  From: jeftreece <notifications@github.com>

To: jazdrv/dnaTools dnaTools@noreply.github.com Cc: moregubbins gubbins@talk21.com; Assign assign@noreply.github.com Sent: Thursday, 22 March 2018, 0:16 Subject: Re: [jazdrv/dnaTools] sort code needs code review (#19)

I have gotten further with this review but still have more to go. Here are some things so far.Approach to getting this merged with master is murky. I have previously suggested a strategy, and I'll reiterate it here. In slack, Feb 20, in #coding-database, Zak asked about the approving pull requests. I answered. I thought that was clear now, if unfamiliar. "this process is new to me, I'm afraid. so silly question ... the idea here is that I just review the code changes in the browser? [do I approve it?]" "yes. I don't think we've really formalized any process (and that's OK with me at least for now) but the idea is changes get reviewed by someone before going into master and ideally any issues identified in the review get discussed/resolved before it gets delivered to master"Because there are changes to files like lib.py and db.py versus latest on master, I don't think I'm really reviewing the correct thing. Until it's in a pull request or diff off of master latest, I can't be very precise with my review comments.March 13, I suggested a strategy: "What I think needs to happen is Zak prepare incremental branches for delivery to master that he can do a pull request on and we can review properly."I recommend breaking this up into incremental deliveries. It can be done as one large delivery, but I don't recommend that because it's quite a bit more difficult to address review comments and make sure the code is all caught up. It's also harder for the reviewer. This is just basic, not specific to this project/branch. There's more than one way to accomplish this, but and here's one possible workflow:

jeftreece commented 6 years ago

I'm not sure what might still be open here, but I'm closing this issue with merge #38