Closed appleparan closed 4 years ago
Merging #54 into master will decrease coverage by
0.71%
. The diff coverage is88.88%
.
@@ Coverage Diff @@
## master #54 +/- ##
==========================================
- Coverage 96.69% 95.97% -0.72%
==========================================
Files 12 13 +1
Lines 272 298 +26
==========================================
+ Hits 263 286 +23
- Misses 9 12 +3
Impacted Files | Coverage Δ | |
---|---|---|
src/Impute.jl | 57.14% <0%> (-6.02%) |
:arrow_down: |
src/imputors.jl | 100% <100%> (ø) |
:arrow_up: |
src/imputors/knn.jl | 95.83% <95.83%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 4546b9a...86d0597. Read the comment docs.
I have no idea why build fails in CI on 1.0. Does anyone helps me? Updating Manifest solved this problem.
In my opinion, broken tests should be solved in another PR.
Since you seem to be comparing Impute.jl against R anyways, would you mind using RCall (or PyCall) to compare your code against an existing implementation (e.g., https://www.rdocumentation.org/packages/bnstruct/versions/1.0.6/topics/knn.impute, https://github.com/iskandr/fancyimpute)?
I will add RCall
Later because this could be applied not only kNN, but SVD. We need to merge SVD and kNN implementation first, then making tests more structural, and adding RCall
should be the last one.
Okay, I've merged the svd implementation, if you want to rebase. I'm guessing most of the conflicts will get automerged.
Okay, I merged with master. I will rebase it. Thank you so much for merging other branches.
I rebased it into single commit. After passing CI, you can merge it.
I rewrited code, however, I found some problems.
original data
and imputed data
on nrmsd
to make dividend for normalising to be always same. RDatasets
?Mean imputation doesn't fill all missing values sometimes. It happens iris dataset usually. I still not figured out what's the problem (my test code or mean(Fill) imputation itself)
Can you provide a reproducible example for this?
Test with random variables I'm not sure we need this. This doesn't guarantee knn imputation always be better than mean imputation due to randomness. What about to use other datasets such as some of in RDatasets?
That's correct, the random variable tests from svd
was to confirm that poor low rank approximations wouldn't cause the method to fail. In fact, the test confirms that poor performance is expected when there are just a couple of uncorrelated variables (e.g., @test knn_nrmsd > mean_nrmsd * 0.9
the 0.9 was just in case of runtime variability).
Mean imputation doesn't fill all missing values sometimes. It happens iris dataset usually. I still not figured out what's the problem (my test code or mean(Fill) imputation itself)
I figured out why. Mean imputation to transposed array is wrong. First, it gives wrong result. Second, if doing imputation on small number of columns like iris dataset and all columns are missing, mean value also be missing.
I fixed most of minor things. As you pointed out, float.(collect())
is bad part of my commit. I thought this would be needed for KDTree, but your suggestion is right. I don't need that. I will fix them and commit again.
Travis was down. Would you retrigger?
If you feel okay, please let me know. I will rebase it.
Test fails due to #61
Yeah, we should also drop appveyor anyways.
Inspired by SVD imputation (#16), I implemented KNN imputation which closes #4
Reference