Closed clhunsen closed 8 years ago
Thanks for your effort to not let the old patches dust away in stale branches, and also for the new contributions! After briefly reviewing the series, I think I'll merge as-is, but could you please also post this pull request itself to the mailing list? No need to post the individual patches as there's not much need to comment on them.
Thanks, Wolfgang
Am 17/02/2016 um 12:27 schrieb Claus Hunsen:
This pull request contains many fixes to the mailing-list analysis that have been lying around on other branches (i.e., |mitchell_updates| and |for-upstream|), but also bring much benefit and fix the analysis to a great extent (called /old patches/ below). Furthermore, there are additional patches that fix some more things (called /new patches/ below).
These main fixes include the following:
- handling the case of proxy e-mails (e.g., |Adrian Prantl via llvm-dev llvm-dev@lists.llvm.org|) [29d777d https://github.com/siemens/codeface/commit/29d777da8dd5f59081d850be492681b4e6fa5e87 (old)],
- removal of problematic characters in author-mail strings [29d777d https://github.com/siemens/codeface/commit/29d777da8dd5f59081d850be492681b4e6fa5e87 (old), 1098eeb https://github.com/siemens/codeface/commit/1098eeb709f5574e932cf981cf2a0f3370c7b66a (old), ee1e7b4 https://github.com/siemens/codeface/commit/ee1e7b4cdf2b045639f6a80545120a09510ac909 (new)] and better handling of malformed author-mail strings [e92a6dd https://github.com/siemens/codeface/commit/e92a6dd3d5529594477435cac8711f9f38f8107a (old), e3f58a1 https://github.com/siemens/codeface/commit/e3f58a12d6140995a7e40374bdbceffeb3f994e6 (old, references #34 https://github.com/siemens/codeface/issues/34)],
- enhancing the loading of mbox files [29ac3f0 https://github.com/siemens/codeface/commit/29ac3f0d425ebfc43940153b5e1c966ec3cfd29d (old), 9d5982e https://github.com/siemens/codeface/commit/9d5982edd1d805b1cf86360fce92e7be627ab0d2 (old)],
- enhancement to storage of e-mails in the |mail| table of the DB and removal of duplicate e-mails [0555675 https://github.com/siemens/codeface/commit/05556759169a5250a6956bd682def63350aef0ea (old), acf0e2e https://github.com/siemens/codeface/commit/acf0e2ef258c164dc172d83cf3370974f9c64344 (old), 8c7263d https://github.com/siemens/codeface/commit/8c7263d66b413a5330f976b4a48fb5db6242005a (new)].
Furthermore, this pull request includes some other patches:
- fix to really re-install GitHub packages in |packages.R| [99666d8 https://github.com/siemens/codeface/commit/99666d8f341c89ba459af3281670e1bc6af20b79 (new)],
- fix of indentation and whitespace in |codeface/R/ml/analysis.r| [3e0a6f8 https://github.com/siemens/codeface/commit/3e0a6f82653dcf365a2018fe5c42ba8cf2713b7f (new)],
- enhancement to a log message in ID service [4745e9a https://github.com/siemens/codeface/commit/4745e9a0ad20642f1313f67aff6b296bacb85767 (new)], and
- addition of the package |screen| to the installation scripts [4f44923 https://github.com/siemens/codeface/commit/4f4492382aca3f1b71f008a5d9d6fcaa8d8bff7d (new)].
You can view, comment on, or merge this pull request online at:
https://github.com/siemens/codeface/pull/40
Commit Summary
- Fix case where email "From" field has atypical form
- Remove quotation and comma characters from email authors
- Generate log output when loading corpus instead of .mbox file
- Change default email analysis behavior to load mbox file
- Change method for extracting dates from corpus
- Remove emails from corpus that have a duplicate id
- Remove move problematic characters from the email authors
- Merge branch 'mitchell_updates' into claus-updates
- Fix case where only an email is available
- Merge branch 'mitchell_updates' into claus-updates
- Try to improve email detection heuristics
- Add screen as package for default installation
- Enhance error message for person lookup in the ID service
- Remove semicolons from the email authors
- Properly sort columns of mail data before inserting into DB
- Fix indentation and trailing whitespace in codeface/ml/analysis.r
Really re-install GitHub packages in packages.R
File Changes
- M codeface/R/ml/analysis.r https://github.com/siemens/codeface/pull/40/files#diff-0 (65)
- M codeface/R/ml/ml_utils.r https://github.com/siemens/codeface/pull/40/files#diff-1 (12)
- M id_service/id_service.js https://github.com/siemens/codeface/pull/40/files#diff-2 (2)
- M integration-scripts/install_common.sh https://github.com/siemens/codeface/pull/40/files#diff-3 (2)
M packages.R https://github.com/siemens/codeface/pull/40/files#diff-4 (62)
Patch Links:
- https://github.com/siemens/codeface/pull/40.patch
- https://github.com/siemens/codeface/pull/40.diff
— Reply to this email directly or view it on GitHub https://github.com/siemens/codeface/pull/40.
Sure, I'll do that. Actually, I have already started to write that e-mail. ;) And thank you for merging.
Merged, thanks.
This pull request contains many fixes to the mailing-list analysis that have been lying around on other branches (i.e.,
mitchell_updates
andfor-upstream
), but also bring much benefit and fix the analysis to a great extent (called old patches below). Furthermore, there are additional patches that fix some more things (called new patches below).These main fixes include the following:
Adrian Prantl via llvm-dev <llvm-dev@lists.llvm.org>
) [29d777d (old)],mail
table of the DB and removal of duplicate e-mails [0555675 (old), acf0e2e (old), 8c7263d (new)].Furthermore, this pull request includes some other patches:
packages.R
[99666d8 (new)],codeface/R/ml/analysis.r
[3e0a6f8 (new)],screen
to the installation scripts [4f44923 (new)].