instructure / canvas-lms

The open LMS by Instructure, Inc.
https://github.com/instructure/canvas-lms/wiki
GNU Affero General Public License v3.0
5.58k stars 2.48k forks source link

SIS import fails integrity check #2147

Closed va7map closed 1 year ago

va7map commented 1 year ago

Summary:

An SIS import created via the REST API with raw POST data always fails the integrity check, because the resulting Attachment object lacks the required md5 value.

Steps to reproduce:

  1. Call /api/v1/accounts/:account_id/sis_imports and send the CSV data in the POST body
  2. Observe the result of the SIS import

NOTE: This appears to only affect on-premise instances backed by local storage, as I am unable to reproduce this in cloud-hosted Canvas.

Expected behavior:

The SIS import should not fail due to the attachment integrity check because it was created legitimately.

Actual behavior:

The SIS import fails when it encounters the following error at the quoted line while opening the CSV attachment:

undefined method `file' for nil:NilClass

https://github.com/instructure/canvas-lms/blob/b8cfe08a3a3dfd10ee2be42f9d0356898d7dfc86/app/models/attachments/local_storage.rb#L68

Additional notes:

I noticed that when an SIS import is created by a non-multipart raw POST request, the resulting Attachment object unexpectedly lacks an md5 value. This value is needed to determine the hash_context in #validate_hash, and without it, an undefined method error is encountered.

The integrity check was just enabled recently — see the related commit. Looking back at older attachments in our Canvas instance that were created long before this change, it seems that the md5 value had always been nil. So perhaps this new integrity check is just shedding light on this previously-undetected bug.

grahamb commented 1 year ago

For Instructure: a support case has been opened for this: 09526924

jstanley0 commented 1 year ago

Hello, thanks for reporting this. When you say "the md5 value had always been nil" are you referring to SIS import attachments or all files? Because I've run a Canvas instance on my laptop with local file storage for years and essentially all of my files have MD5s (including all SIS import attachments).

That said, the integrity check is supposed to be a no-op if the md5 value is missing, but I messed up that logic in the local storage backend and didn't notice it because my MD5s aren't missing and the unit tests focused on the cloud file storage configuration. I'll get a fix for this out soon.

va7map commented 1 year ago

And thank you for taking a look at this!

I was referring only to SIS import attachments that were uploaded in the POST body; other files are normal. The SIS Import page in Canvas uploads the CSV in the attachment multipart/form-data parameter, and from experimentation, this method produces an attachment with the md5 value populated correctly. Only the raw POST method seems to be affected. Could you try this in your instance and see if it's reproducible?

jstanley0 commented 1 year ago

I've merged a fix for the exception. I'll try a raw POST SIS import next week when I get the chance. Feel free to file a separate issue and tag me in it if this is important to you.

jstanley0 commented 1 year ago

fwiw, I was not able to reproduce the problem of a missing md5 with a raw POST import file, either CSV or ZIP

va7map commented 1 year ago

Oh curious... perhaps something else has been causing the missing md5 in our raw-POSTed files.

Thanks for testing this out, and also for the fix!