mitodl / open-discussions

BSD 3-Clause "New" or "Revised" License
10 stars 2 forks source link

"Could not load binary data for file" during OCW import #2614

Open sentry-io[bot] opened 4 years ago

sentry-io[bot] commented 4 years ago

Sentry Issue: DISCUSSIONS-ZP

Could not load binary data for file: 8b11273d0bd227d63b2fde4451e17e5f_MSM_files.zip
mbertrand commented 4 years ago

@pdpinch @gumaerc I think this may be a result of the course JSON files just not having any data for the file (but not sure why that would be the case). This is the relevant code in ocw-data-parser:

https://github.com/mitodl/ocw-data-parser/blob/49cbeea265153a287da44214351b9fc0e2027a42/ocw_data_parser/ocw_data_parser.py#L395-L396

mbertrand commented 4 years ago

These are some of the files that triggered the error. Some URL's work, some don't.

http://ocw.mit.edu/courses/sloan-school-of-management/15-879-research-seminar-in-system-dynamics-spring-2014/student-projects/MSM_files.zip http://ocw.mit.edu/courses/sloan-school-of-management/15-071-the-analytics-edge-spring-2017/demographics-and-employment-in-the-united-states/CPSData.csv http://ocw.mit.edu/courses/sloan-school-of-management/15-071-the-analytics-edge-spring-2017/assignment-1/mvtWeek1.csv http://ocw.mit.edu/courses/political-science/17-951-special-graduate-topic-in-political-science-political-behavior-fall-2005/readings/4_party_ID.pdf http://ocw.mit.edu/courses/political-science/17-951-special-graduate-topic-in-political-science-political-behavior-fall-2005/readings/4_party_id.pdf http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-891j-space-policy-seminar-spring-2003/readings/aerocommissionFinalReport.pdf http://ocw.mit.edu/courses/music-and-theater-arts/21m-380-music-and-technology-sound-design-spring-2016/lecture-notes/MIT21M_380S16_lec_slides.pdf http://ocw.mit.edu/courses/architecture/4-580-inquiry-into-computation-and-design-fall-2006/lecture-notes/l7Acal_shapes.pdf http://ocw.mit.edu/courses/architecture/4-580-inquiry-into-computation-and-design-fall-2006/lecture-notes/l7B_reasoningvis.pdf

https://open-learning-course-data.s3.amazonaws.com/18-996-random-matrix-theory-and-its-applications-spring-2004/68e62894d5f99a72473b2027f220949e_DIABL.pdf https://open-learning-course-data.s3.amazonaws.com/18-996-random-matrix-theory-and-its-applications-spring-2004/02f4dca660917f68799c3438b7ec97b2_diabl.pdf https://open-learning-course-data.s3.amazonaws.com/res-18-008-calculus-revisited-complex-variables-differential-equations-and-linear-algebra-fall-2011/4e3b02219678cd415767dce730bac692_MITRES_18_008_partII_sol01.pdf https://open-learning-course-data.s3.amazonaws.com/res-18-001-calculus-online-textbook-spring-2005/e7f7f60d3efd6635e436c49469cec0e4_MITRES_18_001_Calculus.pdf https://open-learning-course-data.s3.amazonaws.com/res-21g-003-learning-chinese-a-foundation-course-in-mandarin-spring-2011/59cff37041d05178e1a384a126bef2d0_part_IV.zip https://open-learning-course-data.s3.amazonaws.com/res-21g-003-learning-chinese-a-foundation-course-in-mandarin-spring-2011/f4a0654cfb4938d4e9b32b6b1cb9be59_part_III.zip https://open-learning-course-data.s3.amazonaws.com/res-ll-003-build-a-small-radar-system-capable-of-sensing-range-doppler-and-synthetic-aperture-radar-imaging-january-iap-2011/22f9536b65361a75c2616843cac27654_sar_files.zip

http://ocw.mit.edu/courses/architecture/4-580-inquiry-into-computation-and-design-fall-2006/lecture-notes/l7b_reasoningvis.pdf http://ocw.mit.edu/courses/architecture/4-580-inquiry-into-computation-and-design-fall-2006/lecture-notes/l7acal_shapes.pdf

mbertrand commented 4 years ago

Examined the master json for PROD/15/15.879/Spring_2014/15-879-research-seminar-in-system-dynamics-spring-2014/ - via parser.media_jsons: the entry for 15-879-research-seminar-in-system-dynamics-spring-2014/student-projects/MSM_files.zip does not have a _datafield_image attribute with b64-encoded file content, unlike others that were successfully imported.

pdpinch commented 4 years ago

The file MSM_files.zip is linked to from the page at https://ocw.mit.edu/courses/sloan-school-of-management/15-879-research-seminar-in-system-dynamics-spring-2014/student-projects/

The zip file is 91MB, which is out of spec for the Plone CMS. I don't know if that's the root of our problem or not. I'll follow up with Joe Martis.

pdpinch commented 4 years ago

The OCW team has fixed this issue upstream in Plone, moving the oversized file where it belongs, in Akamai netstorage.

Is there a way we can try importing this course again to see if the error has resolved?

mbertrand commented 4 years ago

Yes, I will try that now on RC

mbertrand commented 4 years ago

@pdpinch, ran it on RC and did not see any error message in the logs or on sentry.

odlbot commented 4 years ago

➤ Peter Pinch commented:

I'm going to call this one done.

alicewriteswrongs commented 4 years ago

it looks like this is cropping up again, seeing about 90 events in the last day

alicewriteswrongs commented 4 years ago

our last deploy to prod was Wednesday (July 8th) so it doesn't look like we have a recent code change that would obviously explain what's up

mbertrand commented 2 years ago

Still happening, just 1 course, 1 file, most recently on 11/23. Maybe another bad data issue from Plone? https://sentry.io/organizations/mit-office-of-digital-learning/issues/1990173288/events/?project=216201&statsPeriod=90d

https://open-learning-course-data-rc.s3.amazonaws.com/res-18-001-calculus-online-textbook-spring-2005/93abff40e57c59af839979f32e763688_MITRES_18_001_strang_8.pdf