robmaz / distmap

Sequence alignment on Hadoop
0 stars 1 forks source link

New filesystem layout #18

Closed robmaz closed 6 years ago

robmaz commented 6 years ago

plus a minor change in bin/distmap to find the modules again

magicDGS commented 6 years ago

@robmaz - this PR is too big for being able to review. As I told you, I think that most of the small changes (like removal of binaries and other files), should be done step by step to be sure that there isn't any incompatibility problem. Like that, review will be easier.

I will submit a couple of PR showing the concept of splitting removal changes into sub-units that are easier to review. We can come back to this PR later, or you can split it in other ones...

robmaz commented 6 years ago

I think we cannot really change the filesystem layout one file at a time, because this will take forever and will also be much more confusing than just replacing one layout with another. Of course the PR is big because there is all this binary crap in executables/ and Linux_executables/, which are just being renamed to libexec/MacOS/ and libexec/Linux/, respectively. The former bin/ becomes lib/distmap/ and the executables in the root folder now go into bin/. bin/distmap lines 9--11 refer to the new module location.

That is the extent of the PR and I think it is not a large PR, conceptually; only git unfortunately appears to see this as the deletion and addition of a million files.

Cheers Rupert

2017-11-30 16:01 GMT+01:00 Daniel Gómez-Sánchez notifications@github.com:

@robmaz https://github.com/robmaz - this PR is too big for being able to review. As I told you, I think that most of the small changes (like removal of binaries and other files), should be done step by step to be sure that there isn't any incompatibility problem. Like that, review will be easier.

I will submit a couple of PR showing the concept of splitting removal changes into sub-units that are easier to review. We can come back to this PR later, or you can split it in other ones...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/robmaz/distmap/pull/18#issuecomment-348213519, or mute the thread https://github.com/notifications/unsubscribe-auth/Ad_FfPOeQ6YN4X4gWam0qFTq9KqFgcF7ks5s7sMvgaJpZM4QtK9G .

magicDGS commented 6 years ago

The problem are the binary files. But I suggest to either

The idea is (by commit/PR):

  1. Move the bin folder to a lib folder. this will be simple for git to recognize as a folder re-naming (there is no more binaries after #21)
  2. Move the scripts to bin. Generate a new folder and move a couple of scripts is really easy for git to realize.
  3. Create a libexec folder with the jar files in its root (Linux and MacOS should have the same, and it is replicated in the current PR). This will be more changes for git, but there is less amount of files changed of directory.
  4. Move Linux/MacOS binaries. I recommend one in each commit (either if it is a single PR or several ones).

I think that it is much better to do 1 and 2 in two different PRs for easy review and no blocking the merge, and then 3 and 4 in a super big PR with lots of commits to easy review.

Let me know what do you think about that approach...

robmaz commented 6 years ago

I was hoping to get your opinion on the proposed layout, which, I think, you can easily see if you look at the branch. I am not sure what the added value would be if I now recreated the branch in a slightly different way.

magicDGS commented 6 years ago

@robmaz - I can give you the feedback about the proposed layout: looks good but I have a couple of comments for some parts. The problem is that it is difficult to make the comments in this branch because the display is horrible. The added value for re-create the branch is:

  1. Improve visibility of the changes. When looking at the history of the changes, it will be clear what changed in it (even if it is merged together, coming to the PR will show the diff in every commit and make clear why it was done)
  2. Easier to come back and pinpoint where a regression occurs. For example, if in the move of the files a file disappear or was duplicated, the diff of the change will show it and we can take actions, even if it was already merged (e.g., reverting the commit).
  3. Easier to review and transparent set of changes. One of the problems that you are facing now while managing distmap is that the documentation is really poor, not only at the code level, but also at the changes level. For better maintenance, the history of a repository are like comments/documentation in code: if it is not well-documented and following some principles, it becomes a nightmare for future development.

As this is your repository and your project, it is up to you if you would like to do a more standard and principled software development. I can offer my help here: if you merge #21, I can make a PR with the re-name of bin to lib and another one of the scripts to bin. And then I can left you the libexec folder PR (that you can start without the merging of #21). Like that, we can work in parallel and changes are non-blocking others, making the final folder structure better and traceable...

The added value is: 1) improve visibility of the c

robmaz commented 6 years ago

Ok, let's do it like that then. I appreciate your help, I just need a usable distmap soon, because I want to test it with the Wolbachia data.

magicDGS commented 6 years ago

Thanks @robmaz - and for an usable DistMap with trimming support, it will require a bit more work on issue #5 (I think that it is the blocking step).