tensorflow / tfjs

A WebGL accelerated JavaScript library for training and deploying ML models.
https://js.tensorflow.org
Apache License 2.0
18.51k stars 1.94k forks source link

Mismatch between documentation and the actual behavior of column's dtype #4962

Closed spleshakov closed 3 years ago

spleshakov commented 3 years ago

According to https://js.tensorflow.org/api/latest/#data.csv csvConfig.columnConfigs[columnHeader].dtype can be any value of int32, float32, bool, or string

However, this switch clause https://github.com/tensorflow/tfjs/blob/623da7ecbada115425888c62bd65df685e2bdd75/tfjs-data/src/datasets/csv_dataset.ts#L253 has all specified values but string. Default - parsedValue = valueAsNum;

This also doesn't work in practice, when I use danfojs like so:

service_areas.csv

SERVICE_AREA_ID|ZIPCODE|STATE_CODE|COUNTY_CODE|COUNTY_NAME|SERVICE_AREA_NAME|PLAN_YEAR
"North"|56762|"MN"|"24440"|"Marshall"|"North"|2021
"PR North"|56762|"MN"|"24440"|"Marshall"|"PR North"|2021

javascript

service_areas_data = await danfojs.read_csv(
        "zpr_service_areas.txt",
        {
            delimiter: "|",
            columnConfigs: {
                "ZIPCODE": {
                    dtype: "string"
                }
            }
        }
    )
service_areas_data.ctypes.print()

results in output

╔═══════════════════╤══════════════════════╗
║                   │ 0                    ║
╟───────────────────┼──────────────────────╢
║ SERVICE_AREA_ID   │ string               ║
╟───────────────────┼──────────────────────╢
║ ZIPCODE           │ int32                ║
╟───────────────────┼──────────────────────╢
║ STATE_CODE        │ string               ║
╟───────────────────┼──────────────────────╢
║ COUNTY_CODE       │ int32                ║
╟───────────────────┼──────────────────────╢
║ COUNTY_NAME       │ string               ║
╟───────────────────┼──────────────────────╢
║ SERVICE_AREA_NAME │ string               ║
╟───────────────────┼──────────────────────╢
║ PLAN_YEAR         │ int32                ║
╚═══════════════════╧══════════════════════╝
rthadur commented 3 years ago

In order to expedite the trouble-shooting process, please provide a code snippet to reproduce the issue reported here. Thanks!

spleshakov commented 3 years ago

@rthadur it's in the description. Do you need anything else apart from that?

rthadur commented 3 years ago

I see you are using danfojs, I don't see a reference where you are using tfjs , could you please provide a codepen example or GitHub test repo where we can reproduce the behavior ? Thank you

rthadur commented 3 years ago

I see you have resolved the issue in the reference issue and not related to tfjs we are closing this issue , thank you .

google-ml-butler[bot] commented 3 years ago

Are you satisfied with the resolution of your issue? Yes No