qiita-spots / qiita

Qiita - A multi-omics databasing effort
https://qiita.ucsd.edu/
BSD 3-Clause "New" or "Revised" License
121 stars 80 forks source link

classify_samples_ncv gives error stemming from alpha_group_significance #2964

Open mestaki opened 4 years ago

mestaki commented 4 years ago

Had someone show me this error on their dataset and I was able to duplicate it with some existing data in qiita. Just trying to run a random forest sample classifier on one categorical data with a non-rarefied data table. The collaborator mentions the same error with a rarefied version of the table as well as taxonomy collapsed table.

See full output below.

Command:Nested cross-validated supervised learning classifier. [classify_samples_ncv] (qiime2 2019.10.0)

Status:error

Current step:Validating outputs (1 remaining) via job(s) 7
Error message:3 validator jobs failed: Validator d820566f-366e-4d2d-8e84-7b51f8541565 error
 message: The alpha vector format is incorrect Validator 7a68efab-036a-4f38-9587-70c622ab2296 
error message: The file header seems wrong " importance " Validator 24d5cd74-0dcb-471a-8067-
4f7695f6109b error message: Error executing Validate: ['Traceback (most recent call last):\n', ' File 
"/home/qiita/miniconda3/envs/qiime2.2019.10/lib/python3.6/site-packages/qiita_client/plugin.py",
 line 266, in __call__\n qclient, job_id, job_info[\'parameters\'], output_dir)\n', ' File 
"/home/qiita/miniconda3/envs/qiime2.2019.10/lib/python3.6/site-packages/qiita_client/plugin.py",
 line 105, in __call__\n return self.function(qclient, server_url, job_id, output_dir)\n', ' File 
"/home/qiita/qiita_spots/qtp-diversity/qtp_diversity/validate.py", line 171, in validate\n html_fp, 
html_dir = HTML_SUMMARIZERS[a_type](files, metadata, out_dir)\n', ' File 
"/home/qiita/qiita_spots/qtp-diversity/qtp_diversity/summary.py", line 126, in 
_generate_alpha_vector_summary\n % std_err)\n', 'RuntimeError: Error executing alpha-group-
significance for the summary:\nPlugin error from diversity:\n\n Non-numeric values detected in 
alpha diversity estimates.\n\nDebug info has been saved to /projects/qiita_data/tmp/qiime2-q2cli-
err-ma9e_lwp.log\n\n']"

Job parameters:
Feature table containing all features that should be used for target prediction.:83596

How to handle missing samples in metadata. "error" will fail if missing samples are detected. 
"ignore" will cause the feature table and metadata to be filtered, so that only samples found in 
both files are retained. (missing_samples):error

Automatically tune hyperparameters using random grid search. (parameter_tuning):

Estimator method to use for sample prediction. (estimator):RandomForestClassifier

Number of trees to grow for estimation. More trees will improve predictive accuracy up to a
 threshold level, but will also increase time and memory requirements. This parameter only affects
 ensemble estimators, such as Random Forest, AdaBoost, ExtraTrees, and GradientBoosting.
 (n_estimators):100

Number of jobs to run in parallel. (n_jobs):1
Seed used by random number generator. (random_state):
Number of k-fold cross-validations to perform. (cv):5
Metadata column to use:ancestry

The confusing part is why it is executing executing alpha-group- significance? Not sure what it has to do with that plugin unless it is just using the KW test included there for some reason?

antgonza commented 4 years ago

The error is due to these types not mapping correctly within the Qiita/QIIME2 artifact types:

(qiime2.2019.10) 18:31:04 (qiita@qiita):classify_samples_ncv$ qiime tools peek predictions.qza 
UUID:        10d83423-1360-458e-8503-bb86ef161e0e
Type:        SampleData[ClassifierPredictions]
Data format: PredictionsDirectoryFormat
(qiime2.2019.10) 18:31:39 (qiita@qiita):classify_samples_ncv$ qiime tools peek probabilities.qza 
UUID:        bf9766e0-109c-43e1-bf99-d72e8af96855
Type:        SampleData[Probabilities]
Data format: ProbabilitiesDirectoryFormat
(qiime2.2019.10) 18:31:58 (qiita@qiita):classify_samples_ncv$ qiime tools peek feature_importance.qza 
UUID:        52690214-828d-463f-a50e-42ed17f9e3d5
Type:        FeatureData[Importance]
Data format: ImportanceDirectoryFormat

Which would need to be created as a plugin