usc-isi-i2 / dig-etl-engine

Download DIG to run on your laptop or server.
http://usc-isi-i2.github.io/dig/
MIT License
101 stars 39 forks source link

in mydig, add ability to upload csv files #239

Closed saggu closed 6 years ago

saggu commented 6 years ago

When a csv file is uploaded:

  1. mydig should accept the file
  2. store the file in a directory user_uploaded_files
  3. create a CDR object and store the path of the newly uploaded file in a field raw_content_path
  4. Pass this CDR to the ETK module(s)
  5. ETK modules should handle the case where it'll have to read the file from disk.
saggu commented 6 years ago

This has been implemented. Tested with only csv files. Needs to be tested with any file format; tsv, xls, xlsx etc

saggu commented 6 years ago
def upload_files_mydig(file_path):
    url = "http://localhost:12497/mydig/projects/acled/data?sync=false&log=false"
    file_name = os.path.basename(file_path)
    payload = {
        'file_name': file_name,
        'file_type': 'csv',
        'dataset': 'excel'
    }
    files = {
        'file_data': (file_name, open(file_path, 'r'), 'application/octet-stream')
    }
    print('sending to mydig...')
    file_size = os.path.getsize(file_path) / 1024 / 1024
    timeout = max(file_size * 1, 10)  # 1 second per megabyte
    resp = requests.post(url, data=payload, files=files, timeout=timeout)
    print(resp.status_code, resp.content)

Here is how to do it. Just use file_type as csv and upload any file. This code exists because it cant be done from the frontend yet. I will add another issue for it

briantdu777 commented 5 years ago

I understand the code that is provided, but I am unsure of where this code needs to lie, whether it's an existing python script or if it's a new python script under the "dig-etl-engine" folder.

saggu commented 5 years ago

@briantdu777 This is a sample code, not part of dig-etl-engine. You can run it form anywhere to upload a csv to DIG. It still can't be done from the frontend

briantdu777 commented 5 years ago

@saggu Oh gotcha, I tried to run this code on a standalone python script, and it says that "requests" in the following line isn't defined:

resp = requests.post(url, data=payload, files=files, timeout=timeout)

that's why I thought that this code snippet needed to be placed in a certain spot.

saggu commented 5 years ago

just

import requests

:)