Closed vlulla closed 2 years ago
Let's do the following:
def readSimpleFeatures(path: String): Seq[SimpleFeature] = readSimpleFeatures(new URL(s"file://${new File(path).getAbsolutePath}"), DEFAULT_CHARSET)
def readSimpleFeatures(path: String, charSet: Charset): Seq[SimpleFeature] = readSimpleFeatures(new URL(s"file://${new File(path).getAbsolutePath}"), charSet)
def readSimpleFeatures(url: URL): Seq[SimpleFeature] = readSimpleFeatures(url, DEFAULT_CHARSET)
def readSimpleFeatures(url: URL, charSet: Charset): Seq[SimpleFeature] = ???
// but also all user API methods should have overloads
// ... (just an example below, all polygon methods should have the charset support)
def readMultiPolygonFeatures(path: String): Seq[MultiPolygonFeature[Map[String,Object]]] =
readMultiPolygonFeatures(path, DEFAULT_CHARSET)
def readMultiPolygonFeatures(path: String, charSet: Charset): Seq[MultiPolygonFeature[Map[String,Object]]] =
readSimpleFeatures(path, charSet)
.flatMap { ft => ft.geom[MultiPolygon].map(MultiPolygonFeature(_, ft.attributeMap)) }
// ...
The test may look this way:
class ShapeFileReaderSpec extends AnyFunSpec with Matchers {
describe("ShapeFileReader") {
it("should read MultiPolygons feature attributes") {
val path = "shapefile/data/shapefiles/demographics/demographics.shp"
val features = ShapeFileReader.readMultiPolygonFeatures(path)
features.size should be (160)
for(MultiPolygonFeature(_: MultiPolygon, data: Map[String, Object]) <- features) {
data.keys.toSeq should be (Seq("LowIncome", "gbcode", "ename", "WorkingAge", "TotalPop", "Employment"))
}
}
// https://github.com/locationtech/geotrellis/issues/3445
it("should read UTF-8 MultiPolygons feature attributes") {
val path = "shapefile/data/shapefiles/demographics-utf8/demographics.shp"
val features = ShapeFileReader.readMultiPolygonFeatures(path, Charset.forName("UTF-8"))
features.size should be (160)
features.take(4).map(_.data("ename").asInstanceOf[String]) shouldBe Seq("南关街道", "七里烟香", "谢庄镇", "Cheng Guan Zhen")
}
}
}
Note: I pushed a test file that should work with the test above.
OOPS, i did not see your later comments and tried to fix the typo....sorry!
I have made all the changes that you had mentioned. Please let me know if I have still missed anything.
Thank you for the patient guidance! It has been very educational.
Ah great! I said in the other issue that I'd make a pull request for this but then got distracted by life. Sorry about that, but I'm glad to see @vlulla jumped in and did it!
Overview
This PR makes charset configurable, but defaults to
ISO-8859-1
in order not to change bahavior of the old functions.Checklist
Closes #3445