Closed whedon closed 3 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @kellieotto it looks like you're currently assigned to review this paper :tada:.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
@kellieotto Just checking in (I've been off-line working a major proposal and getting my house on the market to sell). Any questions?
Thanks for checking in. I plan to get to it this week, no questions!
On Sun, Aug 4, 2019 at 3:40 PM Bruce Wilson notifications@github.com wrote:
@kellieotto https://github.com/kellieotto Just checking in (I've been off-line working a major proposal and getting my house on the market to sell). Any questions?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openjournals/joss-reviews/issues/1608?email_source=notifications&email_token=AB46QQV6EBRA6UTMAIWEKH3QC5LGPA5CNFSM4IHXWRN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3QLC3Y#issuecomment-518041967, or mute the thread https://github.com/notifications/unsubscribe-auth/AB46QQWTLG5ITOXFYSBE2HLQC5LGPANCNFSM4IHXWRNQ .
-- Kellie Ottoboni Ph.D. Statistics '19, University of California, Berkeley Fellow at Berkeley Institute for Data Science
Mobile: (650) 520-5056 Website: www.kellieottoboni.com
I've created issues in the repo for the problems I encountered. The biggest issue is that I'm not able to get the examples to run. Maybe I'm doing something silly on my end, but I think there's a dependency management issue here.
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
@whedon add @jjmcnelis as reviewer
OK, @jjmcnelis is now a reviewer
I apologize for the delay. I will complete my initial review by this Wednesday.
EDIT (2019-08-28) @Ji-Zhang I don't have experience working with several of datacleanbot's dependencies (I'll ask @usethedata to clarify if this disqualifies me as a reviewer). These hangups could be familiar for Python users in your domain. So I'll wait for your comment before opening issues on the following points:
Getting started in fresh Python 3 environments:
After applying the multiple imputation approach, the autoclean example (https://datacleanbot.readthedocs.io/en/latest/Example_autoclean.html) attempts to unpickle a remote file located inside your github repository:
metalearner = joblib.load(urlopen("https://github.com/Ji-Zhang/datacleanbot/blob/master/datacleanbot/metalearner_rf.pkl?raw=true"))
I haven't dug into your code, but this seems like a questionable practice (predict_best_anomaly_algorithm
: line 1033 in https://github.com/Ji-Zhang/datacleanbot/blob/master/datacleanbot/dataclean.py). Is it common for machine learning software to access remote resources like this?
There's another piece to this that I'd like you to clarify. From my understanding of the pickle
module, pickled object(s) must be unpickled in a Python environment with compatible versions of the dependent software. E.g. In the autoclean example, I couldn't advance past the routine above (predict_best_anomaly_algorithm
) because my version of sklearn
is incompatible:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/home/jnd/tmp/datacleanbot/lib/python3.6/site-packages/datacleanbot/dataclean.py", line 1344, in autoclean
Xy_cleaned = handle_outlier(features_new, Xy_filled)
File "/home/jnd/tmp/datacleanbot/lib/python3.6/site-packages/datacleanbot/dataclean.py", line 1285, in handle_outlier
best = predict_best_anomaly_algorithm(X, y)
File "/home/jnd/tmp/datacleanbot/lib/python3.6/site-packages/datacleanbot/dataclean.py", line 1049, in predict_best_anomaly_algorithm
metalearner = joblib.load(urlopen("https://github.com/Ji-Zhang/datacleanbot/blob/master/datacleanbot/metalearner_rf.pkl?raw=true"))
File "/home/jnd/tmp/datacleanbot/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 588, in load
obj = _unpickle(fobj)
File "/home/jnd/tmp/datacleanbot/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 526, in _unpickle
obj = unpickler.load()
File "/usr/lib/python3.6/pickle.py", line 1050, in load
dispatch[key[0]](self)
File "/usr/lib/python3.6/pickle.py", line 1338, in load_global
klass = self.find_class(module, name)
File "/usr/lib/python3.6/pickle.py", line 1392, in find_class
return getattr(sys.modules[module], name)
AttributeError: module 'sklearn.utils.deprecation' has no attribute 'DeprecationDict'
Please correct me if I've misinterpreted the traceback.
Overall, I think this will be a good fit for JOSS. Looking forward to your comment.
Hi @Ji-Zhang how are things going on your end for your JOSS submission?
Hi, I am sorry I missed @jjmcnelis 's comments. I am kind of snowed under recently. I will work on that next week and also make the unit test available. Sorry for the delay.
Hi @jjmcnelis I apologize for the delay to answer your questions.
The following are my answers:
Getting started in fresh Python 3 environments:
The remote file metalearner is a pre-trained machine learning model. The prediction of the optimal algorithm involves this model. However, the pickle file metalearner cannot be packaged together through pip. So I took this way around. The sklearn incompatible problem is related to GridSearchCV which I used to train the metalearner model. GridSearch had a DeprecationDict in it, but that doesn't exist in sklearn any more. I tried to retrain the model with the newest sklearn version but it doesn't seem work.
Some dependencies of datacleanbot only support Python3 so I assume Python2 won't work. I will add this note to the document.
Please feel free to let me know if you have more questions. Again, sorry for the delay.
@jjmcnelis How has @Ji-Zhang responded to your concerns?
I've pinged @jjmcnelis out of band.
@kthyng @Ji-Zhang I'm sorry for holding you up. Thanks for the thoughtful responses to my questions. I'm satisfied with points 1 and 2. Point 3 needs addressing.
I raised the point about rpy2
in the interest of usability. I don't use R often enough to justify having it installed on my workstation. I'd look into implementing the functionality provided by rpy2
in Python in a future release so that your software will be usable by the widest audience. But this isn't a JOSS requirement as far as I know.
Since my initial review I've come across a few other examples of machine learning resources retrieved remotely when a model runs, so I guess it's reasonably common.
Dependency issues are many, on my end. I haven't been able to run the examples in three separate attempts, on three separate machines. And I get a number of different tracebacks that bear no resemblance to each other. Did you hear from @kellieotto about their dependency issues (#issuecomment-519314651)?
@Ji-Zhang what are your thoughts about the question in point 3 above?
@usethedata I am working on simplifying the package such that the dependencies can become not this heavy. The light version should have been released before the weekend.
@jjmcnelis @kellieotto I have deprecated the feature of detecting feature data type using the Bayesian method as too many dependencies there. I created a new clean virtual environment to test it and it works well. I hope it can work on your end successfully as well . Thank you.
Hi - As the JOSS Associate Editor-in-Chief on duty this week, I'm going through some submissions with little recent action, and I can't quite tell what is going on in this submission. @usethedata - can you help me understand what is the next action?
@whedon assign @arfon as editor
:wave: all, I'm going to step in here and try and wrap up this submission on behalf of @usethedata.
:wave: @kellieotto, @jjmcnelis, in their most recent comment it sounds like @Ji-Zhang has simplified some of the requirements here.
Do you think you could both take another look at this software again and confirm you're able to install it now? If at all possible, I'd love to get this review wrapped up in the next couple of weeks.
@arfon Sorry for dropping the ball on this! I just went ahead and tried installing again, and ran into another dependency issue (see linked issue)
@arfon Sorry for dropping the ball on this! I just went ahead and tried installing again, and ran into another dependency issue (see linked issue)
Thanks @kellieotto. @Ji-Zhang - can you see if you can figure out what's going on in https://github.com/Ji-Zhang/datacleanbot/issues/10 ?
Hi @arfon and @kellieotto , thank you for spotting the bug. This bug is now fixed. Please update to the latest version of datacleanbot
and feel free to let me know if any questions. Thank you in advance.
:wave: @kellieotto - please come and take another look at this submission when you get a chance.
@arfon I followed up with @Ji-Zhang in Ji-Zhang/datacleanbot#10. Still having import issues.
@arfon I followed up with @Ji-Zhang in Ji-Zhang/datacleanbot#10. Still having import issues.
OK, thanks for checking back in on this.
Dear authors and reviewers
We wanted to notify you that in light of the current COVID-19 pandemic, JOSS has decided to suspend submission of new manuscripts and to handle existing manuscripts (such as this one) on a "best efforts basis". We understand that you may need to attend to more pressing issues than completing a review or updating a repository in response to a review. If this is the case, a quick note indicating that you need to put a "pause" on your involvement with a review would be appreciated but is not required.
Thanks in advance for your understanding.
Arfon Smith, Editor in Chief, on behalf of the JOSS editorial team.
:wave: @kellieotto, @jjmcnelis - how are you getting along with your reviews? I realize these are challenging times for many, so please don't take this enquiry as anything other than a friendly enquiry to see how things are going here.
@kellieotto - it looks like @Ji-Zhang responded to your dependency issues in https://github.com/Ji-Zhang/datacleanbot/issues/10 .
@arfon thanks for the reminder. I followed up with another dependency error in that issue.
👋 @Ji-Zhang - I think the next action is on you here to help @kellieotto with their dependency issues.
@Ji-Zhang - I think the next action is on you here to help @kellieotto with their dependency issues.
Thank you for your kind reminder. Working on it.
@arfon I just followed up in the issue with @Ji-Zhang -- still having the same error as before.
@kellieotto Sorry for this inconvenience. Could you try it again, please?
👋 hi everybody!
Looks like this review was progressing, a dependency issue was flagged by @kellieotto, and @Ji-Zhang made some changes, and is asking for the reviewer to try again. Could we have an update from those involved? If you need time, for whatever reason, just let use know—of course we understand that times are tough.
@labarba @Ji-Zhang I think I should step down as reviewer here. I realize my turnaround time has been very slow. I'm not sure whether it's the package or my installation of Python + associated packages, but I just can't get it to install and build correctly. I'm very sorry for the time wasted.
Hi @kellieotto , I think it is because some dependent packages are outdated. I fixed it by updating the setup file. Run pip install datacleanbot
in a newly created environment, the example should work. Thanks for all the time you have devoted so far.
Hi @Ji-Zhang. This review has been going on for a very long time (more than 1 year) and both reviewers have struggled with this software. As a result, I've just taken a deeper look at this submission and I have to say that I think this software is still well below the standard that is allowable in JOSS. For example:
Given the age of this submission, and the state of this software currently I'm going to reject this submission. If you are able to substantially improve the quality of this software then we would welcome a resubmission to JOSS at some point in the future.
@kellieotto, @jjmcnelis - thanks for your time here. I'm sorry your work here didn't result in a paper for the author.
@whedon reject
Paper rejected.
Submitting author: @Ji-Zhang (Ji Zhang) Repository: https://github.com/Ji-Zhang/datacleanbot Version: v0.4 Editor: @arfon Reviewers: @kellieotto, @jjmcnelis Archive: Pending
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@kellieotto and @jjmcnelis , please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @usethedata know.
✨ Please try and complete your review in the next two weeks ✨
Review checklist for @kellieotto
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
paper.md
file include a list of authors with their affiliations?Review checklist for @jjmcnelis
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
paper.md
file include a list of authors with their affiliations?