secretflow / secretflow

A unified framework for privacy-preserving data analysis and machine learning
https://www.secretflow.org.cn/docs/secretflow/en/
Apache License 2.0
2.35k stars 396 forks source link

secretflow1.9.0b int类型的不支持int(64) #1566

Open john8628 opened 2 weeks ago

john8628 commented 2 weeks ago

Issue Type

Bug

Source

binary

Secretflow Version

secretflow1.90.b

OS Platform and Distribution

centos 7

Python version

3.10.13

Bazel version

No response

GCC/Compiler version

No response

What happend and What you expected to happen.

之前是secretflow1.7.0升级到secretflow1..9.0b 之后做隐私求交;如果int类型大于int32位会报错;具体如下
duckdb.duckdb.ConversionException: Conversion Error: CSV Error on Line: 21867
Original Line: e767ebb5-eb73-4cf7-8467-0056cb2fd960,tecno_bc2c,tecno,English,1,0,2,5,50,20210905,20,17,11,1146,90,30,137,4,4,5,1145,45,2803676068,28,34480,53203935,28,6227,15,679329840,9,5264,28,1,24685447,1,16,74,3,1030,54787220,28,6,3,14627956,222
Error when converting column \"x22\". Could not convert string \"2803676068\" to 'INTEGER'

Reproduction code to reproduce the issue.

组件隐私求交,int32大于32位
lanyy9527 commented 2 weeks ago

在psi1.9中,对数据进行了处理,需要保证你数据的schema正确;

Error when converting column \"x22\". Could not convert string \"2803676068\" to 'INTEGER' 在上传数据时,将特征类型设置为string.

john8628 commented 2 weeks ago

i1.9中,对数据进行了处理,需要保证你数据的schema正确;

你的意思大于int 32位就应该是string的吗?

lanyy9527 commented 2 weeks ago

也可以选择float类型

john8628 commented 1 week ago

用float类型,会不会导致后续的训练时间加长啊;

lanyy9527 commented 1 week ago

问题不大,模型在训练的时候很多都会默认转double