Closed dennishendriksen closed 3 years ago
@shuang1330 @joerivandervelde @SietsmaRJ do you have any thoughts on this?
Hi,
I put a lot of print lines in the preprocessing steps... The "there should'n be any nulls" is to about imputation, but I used to just look at the following printing lines to see whether there are still columns that should have been imputed or not, so no automatic check procedures are there.. The "feature from the model not in the data '' is because that for this test dataset, there are certain levels in the categorical features that do not exist, which is normal.
Best regards, Shuang
On Thu, 27 Aug 2020 at 09:27, Dennis Hendriksen notifications@github.com wrote:
@shuang1330 https://github.com/shuang1330 @joerivandervelde https://github.com/joerivandervelde @SietsmaRJ https://github.com/SietsmaRJ do you have any thoughts on this?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/molgenis/capice/issues/24#issuecomment-681685564, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFAVSYICT3TBNUOR3RDJ4BDSCYDEVANCNFSM4QMVFF3A .
I can further confirm that the variables marked as having null ratio's are not further used in predicting variants. The features marked as "Feature from the model not in data:" remain also unused. They should not indicate a problem in my experience.
Running CAPICE on trio-filtered.vcf.gz using the CAPICE easybuild module on gearshift results in the log: cadd_capice.log.
The log file contains entries such as:
What do these messages mean? Do they indicate a problem?