illinois-or-research-analytics / cm_pipeline

Pipeline that uses an improved version of CM for generating well-connected clusters
GNU General Public License v3.0
5 stars 4 forks source link

Edge case in post_cm_filter.R #54

Open chackoge opened 3 months ago

chackoge commented 3 months ago

For the case where CM does not modify clusters, the cluster_id column is read in as integers by fread and enc2utfg8 (lines 25-28) fails.

read in post-cm clustering

c <- fread(args[1])

convert V2 to UTF-8

c[,V2 := enc2utf8(V2)]

For example... y <- fread('file_name',encoding='UTF-8') y[,enc2utf8(V2)] Error in enc2utf8(V2) : argument is not a character vector

A simple way might be to have c <- fread(args[1], colClasses=c("integer","character"), encoding='UTF-8') and leave everything else unchanged.

An alternative is c[,V2 := enc2utf8(as.character(V2))]