ospc-org / ospc.org

Source code for PolicyBrain, ospc.org, and related assets.
MIT License
24 stars 32 forks source link

Data upload prototype [Not to be merged] #930

Open hdoupe opened 6 years ago

hdoupe commented 6 years ago

We've kicked around the idea of allowing users to upload their own dataset to PolicyBrain. This PR presents a prototype for allowing the user to do this. The purpose of doing this is:

  1. proof of concept This is definitely possible. File upload capabilities similar to those used for the reform and assumptions files were used. Here's the input page:

screen shot 2018-08-23 at 5 52 11 pm

and the output page (using the Tax-Calculator version of the CPS file): screen shot 2018-08-23 at 5 52 51 pm

  1. identify potential challenges The goal is to not crash the server with a very large file. Using the file objects provided by Django and Flask are very helpful in this regard but the data has to be serialized when sent from Django to Flask and from Flask to Celery. I was semi-successfully able to pass a file-like object from Django to Flask but not from Flask to Celery. Celery only receives Pickle, JSON, or msgpack data. I haven't found a good way to to pass a file like object to Celery without using Pickle. I wound up just reading the data into memory and passing it around as a binary blob. This may be a viable approach, but we should be careful to not overwhelm the server.

[Also note that I ran all these tests locally. Hopefully, they hold up when deployed on servers. I plan to test this some time in the coming week.]