Closed AndreaCampagner closed 9 years ago
Hi @AndreaCampagner
Some of this is really good, though I have some concerns. This pull request (PR) adds several things at once. It is better to have a pull request for each change e.g.: one for adding tests, one for fixing a bug, one for adding a new feature. Merging multiple changes in a PR makes it hard to figure out why certain changes are being made (e.g. is an edit fixing a bug or adding a feature?).
You also have many commits, and lots of them have very short or duplicate commit messages. You can merge commits together into a single commit that does a single thing (e.g. adds a feature). This makes it easier to cherry pick commits that I want to merge. Longer commit messages would also be helpful. For instance, I'm not sure what commit bcaa89cb037c7db0d8a8bec3e2e884ca95e64112 is doing - I'm not sure what the original bug was.
Lastly you seem to have included binary files in some of your commits. This makes the git history quite large. Since cloning a repo downloads all the history, users will download all your previous binary files that they will never see or use. It's best to create new commits without those binary files in the history.
Hope these comments are helpful for future - I don't mean to discourage your work.
Onto some specific comments:
Adding options to scoring. If we are changing the API, I think we can be clearer and more flexible:
// Currently
void scoring_init(scoring_t* scoring,
int match, int mismatch,
int gap_open, int gap_extend,
char no_start_gap_penalty, char no_end_gap_penalty,
char no_gaps_in_a, char no_gaps_in_b,
char gaps_ends_a, char gaps_ends_b,
char no_mismatches, char case_sensitive);
// I propose (would give more flexibility):
void scoring_init(scoring_t* scoring,
int match, int mismatch,
int gap_open, int gap_extend,
bool no_start_gap_penalty_a, bool no_end_gap_penalty_a,
bool no_end_gap_penalty_b, bool no_end_gap_penalty_b,
bool no_gaps_in_a, bool no_gaps_in_b,
bool no_mismatches, bool case_sensitive);
Many thanks, Isaac
I'm not sure I understand what gaps_ends_a
/gaps_ends_b
represents? Can't see any comments describing it's meaning.
Hi, first i want to thank you for your advices, since this is actually the first project to which i contribute (I'm doing this for a project at my University) I'm not very well versed in doing these kind of things so I'm very glad for these comments of yours. About your points:
gaps_end_a
and gaps_ends_b
are the variables related to --gapsonlyatends
options (the naming I choose is in fact confusing)--nogaps
"family" no gap could be inserted in the longer sequence and so I added the code to do this control (actually in that commit the code was not correct, in fact I have corrected the code again in commit 016cc9).
For example for sequences acgt
and aaaaacg
the alignment reported by original version of program was: ----acgt
aaaaacg-
But, according to what my professor said, no gaps should be allowed in the longer sequence.
Thanks, Andrea
Ah, I understand the issue. I've implemented --nogapsin1
(variable no_gaps_in_a
) to mean no gaps WITHIN the sequence. I'm allowing gaps either side. This allows the maximum alignment when sliding two sequences over each other. So the following is the correct alignment with --nogaps:
ataagacg- -------acgt
Perhaps we could add an options for --nogapsinlongest
to add the behaviour your professor is after, instead giving the alignment:
ataagacg ---acgt---
My recent commits have added tests, implemented the Longest Common Substring command (bin/lcs
) and fixed some bugs when using options --nomismatches
and --nogaps
. I think the only outstanding issue from this pull request is a feature request to add a --nogapsinlongest
option. Therefore I'm closing this PR and will open a feature request as an issue.
My most recent commit added a new library to seq-align, so to update your version you'll need to run:
git pull
cd libs && make && cd ..
make
Thanks for your input and I hope seq-align has been useful. Please let me know if there are other issues, Isaac
Thanks for clarifying (and for the advice on how to update my version). Many thanks, Andrea
I implemented and modified the experimental options (--gaps_only_at_ends, etc.) and also corrected some bug related to already implemented options (--nomismatch, etc.). Also added a sample test set (containing a dozen of simple tests, primarily related to the experimental options).