Currently, OCW data is written to the database via DRF serializers in course_catalog/serializers.py, this isn't really a use case DRF serializers were built for, so we should move away from them. We originally wrote it this way because it was a way to write relatively simple data structures to the DB with minimal code. Since then our usages have grown more complex and as we continue down that line DRF serializer usage here has become a major pain point in completely unrelated implementations. Particularly, we've run into the following issues in the past:
DRF serializers have a lot of implicit behavior that is hard to reason about and violates the principle of least surprise
This code is weird, because it's actually overriding the nested serializers here which causes DRF to skip validations on those two fields. Removing the assignment to self causes the validation errors to suddenly show up
Nested data writes are not provided in DRF by default, so if your data structure is nontrivial you end up writing it all yourself anyway. This has shown up a couple of times as we've refactored the schema.
To do
Add a transform method in course_catalog/etl/ocw.py that transforms the datastructures comign out of ocw-data-parser into our normalized data structures
Add a pipeline function to course_catalog/etl/pipelines.py that pipes the above transform into load_courses
Update usage of serializers in course_catalog/api.py to call the etl pipeline
Verify new implementation against existing data on RC
Summary
Currently, OCW data is written to the database via DRF serializers in
course_catalog/serializers.py
, this isn't really a use case DRF serializers were built for, so we should move away from them. We originally wrote it this way because it was a way to write relatively simple data structures to the DB with minimal code. Since then our usages have grown more complex and as we continue down that line DRF serializer usage here has become a major pain point in completely unrelated implementations. Particularly, we've run into the following issues in the past:self
causes the validation errors to suddenly show upTo do
transform
method incourse_catalog/etl/ocw.py
that transforms the datastructures comign out ofocw-data-parser
into our normalized data structurescourse_catalog/etl/pipelines.py
that pipes the abovetransform
intoload_courses
course_catalog/api.py
to call the etl pipeline