qcif / data-curator

Data Curator - share usable open data
MIT License
264 stars 38 forks source link

query re iterative data #1039

Closed KyleHaynes closed 3 years ago

KyleHaynes commented 3 years ago

Thanks for such useful software.

Sorry if this question is extremely daft but from my brief usage of the software and scanning through associated documentation, I can't easily seem to see a simple solution.

Say if i receive data x from a data custodian on a monthly basis (with just new records appended), each time I want to import and store (mainly for the benefits of validation) in Data Curator, however, I can't seem to easily apply the original created metadata json schema.

I would have thought the process would be to 1. import csv data (easy enough to do) and then 2 file > import column properties > json from file ... image And select the schema from the original zip file (unzipped obviously)... but doing so just indicates it's not a valid schema.

Otherwise, I would have thought it might be possible from importing package properties, but the only option is from a URL, and the data that I'm interested in adding will never be uploaded to any internal/external URL.

Here is some dummy data if interested in poking about ... ss_c1_export.zip

ghost commented 3 years ago

Hi @KyleHaynes I'll have a look at the data when I can and see what the problem might be with the schema or the application But just to make sure I've understood the issue:

  1. I would have thought it might be possible from importing package properties, but the only option is from a URL,. Yes unfortunately there are 'Import' options that we used up our current scope on and didn't have time to add others (Table, and Package by file. With more funding to the project, I'm hoping to be able to add these features in to complete this.
  2. doing so just indicates it's not a valid schema. So if you're importing Column properties, I'll have a look at the schema and see where the issue might be. The lower level errors that come back from the libraries we use can be difficult to interpret depending on what the error is, but in a future release maybe we can look at adding this as an option to display these errors in event that someone does want to try to dig further.

If there isn't a problem with the application itself, there might be some other ways to try to use the data (But again please let me know if I haven't quite understood the use case here):

  1. If you already have the original data and the original schema (ie: including the Column properties):

    • Open a new tab with the newest data,
    • Copy and paste this data, (e.g., 'Select All' and 'Copy') from the new, second table over the existing one in the first tab.
    • If the table had a header row 'locked' ('Tools'-> 'Header Row') in either of the tables, and they are different, you will need to unlock the table header to include it in any 'Copy' or 'Paste' That way it should keep the existing column properties and you can then apply to the new data.
  2. If you have the unzipped new data available in a folder and you have the original table and schema already in Data Curator, you could:

    • clear the existing data from the table (say 'Select All' and 'Delete') - again ensure that the header row is unlocked to include it in the 'Delete'
    • drag and drop the file from your unzipped folder into the blank tab This should also keep the original schema as it is just the data that you have switched.
  3. It's not clear to me yet (once I have a look at the example you've supplied I might know more) whether the schema is a valid frictionless schema and is not being imported just because there is an issue with the data not matching the schema. If that is the reason, then another way to allow the import might be:

    • Start the Data Curator without any existing data or schemas
    • Check the number of columns that are in the schema that you have and add these so the Data Curator has the correct number of rows
    • Now import the Schema
    • Open the csv file (or drag/drop file in)

Let me know if any of these cases help or if there is more detail for me to consider here. Although it doesn't cover the case you have, the 'Help' menu, does also offer some basics about use of Data Curator that might be useful.

The use case(s) you've raised here though are a new one for me. If the schema that you have is a valid frictionless schema (ie: regardless of what the data may be), it would be useful for Data Curator to still import it, no matter what state the data is in. Once I've tested your example Kyle I'll add more to this

ghost commented 3 years ago

Hi @KyleHaynes I've had at go at importing for column properties and it seems like it succeeded. Not sure if you've seen the frictionless documentation (from the help message you mentioned in Data Curator), but basically I removed the outer json for package and table, so it just showed everything under, but not including the keyname: schema. Data Curator will flag that it needs a certain number of columns if it doesn't match, but that's usually an indicator that it recognises the schema and it's just a matter of adding the number of blank columns required. I've added your example schema to our Data Curator's test fixtures, here - a copy of how the json looks as just a schema (as opposed to the original datapackage.json you supplied). Hope it helps.

KyleHaynes commented 3 years ago

@mattRedBox - thanks a lot for the detailed reply - this has worked perfectly. Cheers Kyle

ghost commented 3 years ago

No problem @KyleHaynes. Glad it worked.