mekevans / forestTIME

1 stars 0 forks source link

Data typing issue #29

Closed diazrenata closed 3 weeks ago

diazrenata commented 8 months ago

Some columns have unusual values of unusual data types buried deep (e.g. a string HABITATTYPE buried deep in the Idaho COND table). This throws off duckdbfs because it guesses the data type based on the first few hundred/thousand rows of the dataset.

As far as I can tell, duckdbfs doesn't use the schema argument. In any event I've been unsuccessful in using it to, e.g., coerce that column to read in as a string (even though it looks like integers for the first many rows).

I tried switching to arrow but am having a recurrent "csv error conversion to type null" error.

In any event I'm thinking it makes just as much sense to switch over to a dbplyr approach using, e.g., duckdb as the database backend. That way there's not as many path contingencies, and I expect this to be more intuitive and interoperable for FIADB users internal and external to the forest service