koopjs / koop-provider-csv

Koop provider for CSV
MIT License
5 stars 3 forks source link

Support for publishing a directory containing multiple csv files with same schema as a single layer #30

Closed anandak closed 1 year ago

anandak commented 1 year ago

Hi,

Is it possible to publish a directory containing multiple csv files with exactly the same schema as a single layer? We have a directory in which a CSV file gets added daily with exactly the same schema. Instead of combining it every time into a single large CSV file everyday, is it possible to configure the CSV provider in a way that it picks up all CSV files in that directory (or based on a file naming pattern) and output that as a single layer?

haoliangyu commented 1 year ago

You are trying to load a virtual CSV file that is stored in parts. This is not supported in the current version, but this is technically possible. The only caveat is that the file-loading order may not be guaranteed. A proper sorting must be done at the FeatureServer request or the client side.

anandak commented 1 year ago

It is okay if the file-loading order is not guaranteed. However, will paginated queries work alright if this is the case? are the files read every time the server is started, thus affecting the order? or on every request?

haoliangyu commented 1 year ago

What can be done is to sort the file list by alphabet or by modified time before reading them. Even we can open the sorting configuration. So that you can have a relatively stable data loading order. But again, each sorting mechanism has its own edge cases and that's why the order is not guaranteed. In this case, you need to sort by a data attribute to compensate for the potential randomness in data loading.

haoliangyu commented 1 year ago

I have added the code to support loading multiple CSV files for a single source. You can specify a glob pattern in the new path property of the source configuration (see example). You can test this feature with the latest version v3.2.0.