theodi / octopub

Publish data easily, quickly and correctly
https://octopub.io/
Other
41 stars 18 forks source link

Validate CSVw schema throws error #486

Open quadrophobiac opened 7 years ago

quadrophobiac commented 7 years ago

Steps to Reproduce (for problems)

Add spec/fixtures/valid-cotw.csv dataset to Octopub file upload
Attach spec/fixtures/schemas/csv-on-the-web-schema.json for schema
Click upload
Octopub will hang

Current Behaviour (for problems)

The logs indicate that a problem occurs in the function get_file_for_validation_from_file() https://github.com/theodi/octopub/blob/master/app/models/dataset_file.rb#L156-L158 This raises an exception WARN: TypeError: no implicit conversion of StringIO into String. It's not clear if this exception is what is causing the service to hang

Your Environment

quadrophobiac commented 7 years ago

Running into problems duplicating this locally, as the same steps to follow run aground far earlier within create schema flow, throwing the following error

method=POST path=/datasets format=*/* controller=DatasetsController action=create status=500 
error='ActiveModel::UnknownAttributeError: unknown attribute 'owner_username' for DatasetFileSchema.' duration=487.74 view=0.00 db=0.00
ActiveModel::UnknownAttributeError - unknown attribute 'owner_username' for DatasetFileSchema.:

Appears to relate to this part of dataset_file_schema.rb

validates_presence_of :owner_username, message: 'Please select an owner for the schema'
quadrophobiac commented 7 years ago

A similar error is encountered if you follow these steps

  1. Add spec/fixtures/valid-cotw.csv dataset to Octopub file upload
  2. Upload without adding schema
  3. add spec/fixtures/schemas/csv-on-the-web-schema.json through form at https://octopub.io/dataset_file_schemas/new
  4. Edit dataset added in Step 1
  5. Select schema added in Step 3
  6. Octopub will hang

In this sequence of steps the same method hangs but a different error is thrown:
NoMethodError: undefined methodtempfile' for nil:NilClass`

caiwilliamson commented 4 years ago

I reproduced the issue with the original instructions. The issue starts in dataset_file.rb in validate_schema_cotw with the code: tempfile = get_file_for_validation_from_file which calls the following function:

def get_file_for_validation_from_file
  File.new(file.tempfile)
end

File.new(file.tempfile) is trying to create a file from a StringIO which doesn't work.

Furthermore, later in validate_schema_cotw, some code tries to get the path from the tempfile, which doesn't work since it is never saved anywhere.

The solution is to create the tempfile, by doing something like:

def self.file_from_url_with_storage_key(file, storage_key)
  Rails.logger.info "DatasetFile: In file_from_url_with_storage_key"

  fs_file = FileStorageService.get_string_io(storage_key)
  tempfile = Tempfile.new
  tempfile.write fs_file
  tempfile.rewind
  ActionDispatch::Http::UploadedFile.new filename: File.basename(file),
                                         # content_type: 'text/csv',
                                         tempfile: tempfile
end

def get_file_for_validation_from_file
  file.tempfile
end

This resolves the original issue. But now we see the error:

NoMethodError: undefined method 'validate_header' for nil:NilClass

This is thrown by the following line in validate_schema_cotw:

validation = Csvlint::Validator.new(tempfile, {}, schema)

This issue appears to be with the CSV Lint gem itself in the way that it handles CSVw files, since the supplied arguments are correct. The stack trace points to the issue being in csvlint/csvw/table_group.rb:27:in 'validate_header'. So that's as far as I can get with this one I'm afraid.