intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.45k stars 1.24k forks source link

Auto estimator did not support different shape of feature columns #3854

Closed jack1981 closed 2 years ago

jack1981 commented 2 years ago

We got exception when we pass different type of feature columns into auto_estimator.fit() , the exception was because np.concatenate requires each feature column must have same shape api.zip/bigdl/orca/automl/search/ray_tune/ray_tune_search_engine.py", line 410, in (ImplicitFunc pid=39919) X = np.concatenate([np.stack(shard["x"], axis=1) for shard in shards], axis=0)

Below is example code we detected this issue input_1 = tf.keras.Input(shape=(1,), name='input_1') ... input_2 = tf.keras.Input(shape=(730,), name='input_2') ... feature_cols = ['input_1', 'input_2'] target_cols = ['decoder_4', 'output']

auto_est.fit(data=train_df,
             validation_data=test_df,
             metric="error",
             metric_mode="min",
             n_sampling=num_rand_samples,
             search_space=search_space,
             search_alg=search_alg,
             search_alg_params=None,
             scheduler=scheduler,
             scheduler_params=scheduler_params,
             feature_cols=feature_cols,
             target_cols=target_cols)

BTW , we tested https://github.com/intel-analytics/BigDL/blob/branch-2.0/python/orca/src/bigdl/orca/learn/tf2/ray_estimator.py which can support multiple shapes feature inputs.

shanyu-sys commented 2 years ago

Previously, the feature_cols and target_cols in AutoEstimator share the same semantics with AutoXGBoost, where the feature_cols in the input DataFrame is regarded as the second dim for input X, which is the only input for the internal model. While in the context of Orca Estimator, feature_cols are the different inputs for the internal model.

In PR #3857, I have changed the semantics of AutoEstimator to be consistent with that of Estimator.

jack1981 commented 2 years ago

Many Thanks! Could I have a new api zip to verify?

Jack

Get Outlook for iOShttps://aka.ms/o0ukef


From: Yu Shan @.> Sent: Monday, January 10, 2022 7:33:14 AM To: intel-analytics/BigDL @.> Cc: Song, Suqiang @.>; Author @.> Subject: {EXTERNAL} Re: [intel-analytics/BigDL] Auto estimator did not support different shape of feature columns (Issue #3854)

CAUTION: The message originated from an EXTERNAL SOURCE. Please use caution when opening attachments, clicking links or responding to this email.

Previously, the feature_cols and target_cols in AutoEstimator share the same semantics with AutoXGBoost, where the feature_cols in the input DataFrame is regarded as the second dim for input X, which is the only input for the internal model. While in the context of Orca Estimator, feature_cols are the different inputs for the internal model.

In PR #3857https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_intel-2Danalytics_BigDL_pull_3857&d=DwMCaQ&c=uc5ZRXl8dGLM1RMQwf7xTCjRqXF0jmCF6SP0bDlmMmY&r=f-UxR4NBBrTCJxJCM2q7EQ_6HdOD4NB8lfUCrX9tUI4&m=WZileCqYSykO2Ul9Pz3Lv5_qi76W2bdjzOaZUDihAstPJAt8uuoT9fXuZrcicWHc&s=NQ17fpRID-plstKBIXzs8Bq6sFZnfh8BCdeIWhaUfmM&e=, I have changed the semantics of AutoEstimator to be consistent with that of Estimator.

— Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_intel-2Danalytics_BigDL_issues_3854-23issuecomment-2D1008998739&d=DwMCaQ&c=uc5ZRXl8dGLM1RMQwf7xTCjRqXF0jmCF6SP0bDlmMmY&r=f-UxR4NBBrTCJxJCM2q7EQ_6HdOD4NB8lfUCrX9tUI4&m=WZileCqYSykO2Ul9Pz3Lv5_qi76W2bdjzOaZUDihAstPJAt8uuoT9fXuZrcicWHc&s=G17e_IlBIGrfOYufeidYpAURJgLRPW_7Cev_wFZ9-KU&e=, or unsubscribehttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ADDUJ4Z4IXSP57YIBMNXZUDUVL33VANCNFSM5LTA2PZQ&d=DwMCaQ&c=uc5ZRXl8dGLM1RMQwf7xTCjRqXF0jmCF6SP0bDlmMmY&r=f-UxR4NBBrTCJxJCM2q7EQ_6HdOD4NB8lfUCrX9tUI4&m=WZileCqYSykO2Ul9Pz3Lv5_qi76W2bdjzOaZUDihAstPJAt8uuoT9fXuZrcicWHc&s=jlMxUQ4MCU0lLIBU7nC999cyf-D7VsMrqBhQN3YLqw8&e=. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.proofpoint.com/v2/url?u=https-3A__apps.apple.com_app_apple-2Dstore_id1477376905-3Fct-3Dnotification-2Demail-26mt-3D8-26pt-3D524675&d=DwMCaQ&c=uc5ZRXl8dGLM1RMQwf7xTCjRqXF0jmCF6SP0bDlmMmY&r=f-UxR4NBBrTCJxJCM2q7EQ_6HdOD4NB8lfUCrX9tUI4&m=WZileCqYSykO2Ul9Pz3Lv5_qi76W2bdjzOaZUDihAstPJAt8uuoT9fXuZrcicWHc&s=_YN5m5XVVf6Sjg9pqLUlHATUGNG7KlSBdek0rEl8a-c&e= or Androidhttps://urldefense.proofpoint.com/v2/url?u=https-3A__play.google.com_store_apps_details-3Fid-3Dcom.github.android-26referrer-3Dutm-5Fcampaign-253Dnotification-2Demail-2526utm-5Fmedium-253Demail-2526utm-5Fsource-253Dgithub&d=DwMCaQ&c=uc5ZRXl8dGLM1RMQwf7xTCjRqXF0jmCF6SP0bDlmMmY&r=f-UxR4NBBrTCJxJCM2q7EQ_6HdOD4NB8lfUCrX9tUI4&m=WZileCqYSykO2Ul9Pz3Lv5_qi76W2bdjzOaZUDihAstPJAt8uuoT9fXuZrcicWHc&s=GJcqsdFXI3WqpWA9t_Yse8lKtDZNOWT52r7RfzlzUig&e=. You are receiving this because you authored the thread.Message ID: @.***>

CONFIDENTIALITY NOTICE This e-mail message and any attachments are only for the use of the intended recipient and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If you are not the intended recipient, any disclosure, distribution or other use of this e-mail message or attachments is prohibited. If you have received this e-mail message in error, please delete and notify the sender immediately. Thank you.

shanyu-sys commented 2 years ago

Hi Jack,

If the fix works, I may close the issue. Thanks!

jack1981 commented 2 years ago

Thanks , the fix worked !